tech:slurm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
tech:slurm [2020/06/22 17:50] – [CUDA NVIDIA TESLA Infos] kohofer | tech:slurm [2022/11/24 16:17] (current) – [Compute Nodes] kohofer | ||
---|---|---|---|
Line 14: | Line 14: | ||
===== Installation ===== | ===== Installation ===== | ||
+ | |||
===== Controller name: slurm-ctrl ===== | ===== Controller name: slurm-ctrl ===== | ||
Line 241: | Line 242: | ||
debug* | debug* | ||
- | If computer node is **<color # | + | If computer node is <color # |
< | < | ||
Line 265: | Line 266: | ||
</ | </ | ||
+ | < | ||
+ | sinfo -o " | ||
+ | NODELIST | ||
+ | gpu[02-03] | ||
+ | gpu04 | ||
+ | hpcmoi01, | ||
+ | </ | ||
- | ===== Compute Nodes ===== | ||
+ | ===== Compute Nodes ===== | ||
A compute node is a machine which will receive jobs to execute, sent from the Controller, it runs the slurmd service. | A compute node is a machine which will receive jobs to execute, sent from the Controller, it runs the slurmd service. | ||
Line 341: | Line 349: | ||
chown root: / | chown root: / | ||
- | | + | |
+ | === Directories === | ||
+ | |||
+ | Be sure that the nfs mounted partitions are, all there: | ||
+ | |||
+ | < | ||
+ | /data | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | /scratch | ||
+ | </ | ||
+ | |||
+ | ===== Modify user accounts ===== | ||
+ | |||
+ | Display the accounts created: | ||
+ | |||
+ | # Show also associations in the accounts | ||
+ | sacctmgr show account -s | ||
+ | # Show all columns separated by pipe | symbol | ||
+ | sacctmgr show account -s -P | ||
+ | # | ||
+ | sacctmgr show user -s | ||
+ | |||
+ | Add user | ||
+ | |||
+ | sacctmgr add user < | ||
+ | |||
+ | Modify user, give 12000 minutes/200 hours for usage | ||
+ | |||
+ | sacctmgr modify user < | ||
+ | |||
+ | Modify user by removing it from certain account | ||
+ | |||
+ | sacctmgr remove user where user=< | ||
+ | |||
+ | Delete user | ||
+ | |||
+ | sacctmgr delete user ivmilan | ||
+ | Deleting users... | ||
+ | ivmilan | ||
+ | Would you like to commit changes? (You have 30 seconds to decide) | ||
+ | (N/y): y | ||
+ | |||
+ | |||
+ | Restart the services: | ||
+ | |||
+ | systemctl restart slurmctld.service | ||
+ | systemctl restart slurmdbd.service | ||
+ | |||
+ | Check status: | ||
+ | |||
+ | systemctl status slurmctld.service | ||
+ | systemctl status slurmdbd.service | ||
+ | |||
+ | ==== Submit a job to a specific node using Slurm' | ||
+ | |||
+ | To run a job on a specific Node use this option in the job script | ||
+ | |||
+ | #SBATCH --nodelist=gpu03 | ||
+ | |||
===== Links ===== | ===== Links ===== | ||
+ | |||
+ | [[https:// | ||
[[https:// | [[https:// | ||
Line 362: | Line 433: | ||
====== Modules ====== | ====== Modules ====== | ||
+ | |||
+ | The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles. | ||
+ | |||
+ | Installing Modules on Unix | ||
+ | |||
+ | Login into slurm-ctrl and become root | ||
+ | |||
+ | ssh slurm-ctrl | ||
+ | sudo -i | ||
+ | |||
+ | Download modules | ||
+ | |||
+ | curl -LJO https:// | ||
+ | tar xfz modules-4.6.0.tar.gz | ||
+ | cd modules-4.6.0 | ||
+ | |||
+ | |||
+ | $ ./configure --prefix=/ | ||
+ | $ make | ||
+ | $ make install | ||
+ | |||
+ | |||
+ | |||
+ | https:// | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | ===== SPACK ===== | ||
+ | |||
+ | |||
+ | Add different python versions using spack! | ||
+ | |||
+ | 1. First see which python versions are available: | ||
+ | |||
+ | root@slurm-ctrl: | ||
+ | ==> Safe versions (already checksummed): | ||
+ | 3.8.2 3.7.7 3.7.4 3.7.1 3.6.7 3.6.4 3.6.1 3.5.2 3.4.10 | ||
+ | 3.8.1 3.7.6 3.7.3 3.7.0 3.6.6 3.6.3 3.6.0 3.5.1 3.4.3 | ||
+ | 3.8.0 3.7.5 3.7.2 3.6.8 3.6.5 3.6.2 3.5.7 3.5.0 3.3.6 | ||
+ | ==> Remote versions (not yet checksummed): | ||
+ | 3.10.0a6 | ||
+ | 3.10.0a5 | ||
+ | ... | ||
+ | ... | ||
+ | |||
+ | 2. now select the python version you would like to install: | ||
+ | |||
+ | root@slurm-ctrl: | ||
+ | ==> 23834: Installing libiconv | ||
+ | ==> Using cached archive: / | ||
+ | ==> Staging archive: / | ||
+ | ==> Created stage in / | ||
+ | ==> No patches needed for libiconv | ||
+ | ==> 23834: libiconv: Building libiconv [AutotoolsPackage] | ||
+ | ==> 23834: libiconv: Executing phase: ' | ||
+ | ==> 23834: libiconv: Executing phase: ' | ||
+ | ==> 23834: libiconv: Executing phase: ' | ||
+ | ==> 23834: libiconv: Executing phase: ' | ||
+ | ==> 23834: libiconv: Successfully installed libiconv | ||
+ | Fetch: 0.04s. | ||
+ | [+] / | ||
+ | ==> 23834: Installing libbsd | ||
+ | ... | ||
+ | ... | ||
+ | ... | ||
+ | ==> 23834: Installing python | ||
+ | ==> Fetching https:// | ||
+ | ############################################################################################################ | ||
+ | ==> Staging archive: / | ||
+ | ==> Created stage in / | ||
+ | ==> Ran patch() for python | ||
+ | ==> 23834: python: Building python [AutotoolsPackage] | ||
+ | ==> 23834: python: Executing phase: ' | ||
+ | ==> 23834: python: Executing phase: ' | ||
+ | ==> 23834: python: Executing phase: ' | ||
+ | ==> 23834: python: Executing phase: ' | ||
+ | ==> 23834: python: Successfully installed python | ||
+ | Fetch: 1.81s. | ||
+ | [+] / | ||
+ | |||
+ | |||
+ | This will take some minutes time, depending on the type of version | ||
+ | |||
+ | |||
+ | 3. Now you need to add a modules file | ||
+ | |||
+ | root@slurm-ctrl: | ||
+ | |||
+ | < | ||
+ | #%Module1.0 | ||
+ | proc ModulesHelp { } { | ||
+ | global dotversion | ||
+ | | ||
+ | puts stderr " | ||
+ | } | ||
+ | |||
+ | module-whatis " | ||
+ | |||
+ | set | ||
+ | set-alias | ||
+ | |||
+ | prepend-path | ||
+ | prepend-path | ||
+ | |||
+ | </ | ||
+ | |||
+ | 4. New module should now be available: | ||
+ | |||
+ | root@slurm-ctrl: | ||
+ | -------------------------------------------- / | ||
+ | anaconda3 | ||
+ | bzip | ||
+ | cuda-10.2 | ||
+ | cuda-11.0 | ||
+ | |||
+ | 5. Load the new module | ||
+ | |||
+ | root@slurm-ctrl: | ||
+ | |||
+ | 6. Verify it works | ||
+ | |||
+ | root@slurm-ctrl: | ||
+ | Python 3.8.2 (default, Mar 19 2021, 11:05:37) | ||
+ | [GCC 9.3.0] on linux | ||
+ | Type " | ||
+ | >>> | ||
+ | |||
+ | 7. Unload the new module | ||
+ | |||
+ | module unload python-3.8.2 | ||
+ | |||
===== Python ===== | ===== Python ===== | ||
Line 391: | Line 594: | ||
==== Create modules file ==== | ==== Create modules file ==== | ||
+ | **PYTHON** | ||
cd / | cd / | ||
Line 407: | Line 611: | ||
</ | </ | ||
- | | ||
+ | **CUDA** | ||
+ | vi / | ||
+ | |||
+ | < | ||
+ | #%Module1.0 | ||
+ | proc ModulesHelp { } { | ||
+ | global dotversion | ||
+ | |||
+ | puts stderr " | ||
+ | } | ||
+ | |||
+ | module-whatis " | ||
+ | |||
+ | set | ||
+ | |||
+ | setenv | ||
+ | prepend-path | ||
+ | prepend-path | ||
+ | </ | ||
===== GCC ===== | ===== GCC ===== | ||
Line 880: | Line 1102: | ||
===== Links ===== | ===== Links ===== | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | [[https:// | ||
+ | |||
https:// | https:// |
/data/www/wiki.inf.unibz.it/data/attic/tech/slurm.1592841045.txt.gz · Last modified: 2020/06/22 17:50 by kohofer