tech:slurm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revisionNext revisionBoth sides next revision | ||
tech:slurm [2019/09/06 15:44] – kohofer | tech:slurm [2020/02/10 17:07] – kohofer | ||
---|---|---|---|
Line 17: | Line 17: | ||
==== Controller ==== | ==== Controller ==== | ||
- | Controller name: slurm-ctrl | + | ===== Controller name: slurm-ctrl |
Install slurm-wlm and tools | Install slurm-wlm and tools | ||
ssh slurm-ctrl | ssh slurm-ctrl | ||
- | apt install slurm-wlm slurm-wlm-doc mailutils | + | apt install slurm-wlm slurm-wlm-doc mailutils mariadb-client mariadb-server libmariadb-dev python-dev python-mysqldb |
=== Install Maria DB Server === | === Install Maria DB Server === | ||
Line 114: | Line 114: | ||
</ | </ | ||
- | | + | Copy slurm.conf to compute nodes! |
+ | |||
+ | | ||
+ | |||
+ | vi / | ||
+ | |||
+ | < | ||
+ | [Unit] | ||
+ | Description=Slurm controller daemon | ||
+ | After=network.target munge.service | ||
+ | ConditionPathExists=/ | ||
+ | Documentation=man: | ||
+ | |||
+ | [Service] | ||
+ | Type=forking | ||
+ | EnvironmentFile=-/ | ||
+ | ExecStart=/ | ||
+ | ExecReload=/ | ||
+ | PIDFile=/ | ||
+ | |||
+ | [Install] | ||
+ | WantedBy=multi-user.target | ||
+ | |||
+ | </ | ||
+ | |||
+ | vi / | ||
+ | |||
+ | < | ||
+ | [Unit] | ||
+ | Description=Slurm node daemon | ||
+ | After=network.target munge.service | ||
+ | ConditionPathExists=/ | ||
+ | Documentation=man: | ||
+ | |||
+ | [Service] | ||
+ | Type=forking | ||
+ | EnvironmentFile=-/ | ||
+ | ExecStart=/ | ||
+ | ExecReload=/ | ||
+ | PIDFile=/ | ||
+ | KillMode=process | ||
+ | LimitNOFILE=51200 | ||
+ | LimitMEMLOCK=infinity | ||
+ | LimitSTACK=infinity | ||
+ | |||
+ | [Install] | ||
+ | WantedBy=multi-user.target | ||
+ | </ | ||
+ | |||
+ | |||
+ | root@slurm-ctrl# | ||
+ | root@slurm-ctrl# | ||
+ | root@slurm-ctrl# | ||
+ | root@slurm-ctrl# | ||
root@slurm-ctrl# | root@slurm-ctrl# | ||
+ | |||
=== Accounting Storage === | === Accounting Storage === | ||
Line 160: | Line 214: | ||
Copy / | Copy / | ||
- | scp / | + | scp / |
+ | |||
+ | Allow password-less access to slurm-ctrl | ||
+ | |||
+ | csadmin@slurm-ctrl: | ||
| | ||
Run a job from slurm-ctrl | Run a job from slurm-ctrl | ||
Line 182: | Line 240: | ||
PARTITION AVAIL TIMELIMIT | PARTITION AVAIL TIMELIMIT | ||
debug* | debug* | ||
+ | |||
+ | If computer node is down | ||
+ | |||
+ | < | ||
+ | sinfo -a | ||
+ | PARTITION AVAIL TIMELIMIT | ||
+ | debug* | ||
+ | </ | ||
+ | |||
+ | scontrol update nodename=gpu02 state=idle | ||
+ | scontrol update nodename=gpu03 state=idle | ||
+ | |||
+ | < | ||
+ | sinfo -a | ||
+ | PARTITION AVAIL TIMELIMIT | ||
+ | debug* | ||
+ | </ | ||
+ | |||
+ | |||
==== Compute Nodes ==== | ==== Compute Nodes ==== | ||
Line 189: | Line 266: | ||
{{: | {{: | ||
- | === Installation === | + | === Installation |
+ | |||
+ | ssh -l csadmin < | ||
+ | sudo apt install slurm-wlm libmunge-dev libmunge2 munge | ||
+ | sudo systemctl enable slurmd | ||
+ | sudo systemctl enable munge | ||
+ | sudo systemctl start slurmd | ||
+ | sudo systemctl start munge | ||
- | ssh -l csadmin 10.7.20.102 | ||
- | sudo apt install slurm-wlm | ||
- | |||
Generate ssh keys | Generate ssh keys | ||
+ | |||
ssh-keygen | ssh-keygen | ||
Line 216: | Line 299: | ||
owner and permission. | owner and permission. | ||
- | mv / | + | mv / |
chown munge:munge / | chown munge:munge / | ||
chmod 400 / | chmod 400 / | ||
Line 239: | Line 322: | ||
[[https:// | [[https:// | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | {{ : |
/data/www/wiki.inf.unibz.it/data/pages/tech/slurm.txt · Last modified: 2022/11/24 16:17 by kohofer