User Tools

Site Tools


tech:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
tech:slurm [2020/06/22 16:22] – [CUDA NVIDIA TESLA] kohofertech:slurm [2021/03/19 12:28] – [Modules] kohofer
Line 267: Line 267:
  
 ===== Compute Nodes ===== ===== Compute Nodes =====
- 
  
 A compute node is a machine which will receive jobs to execute, sent from the Controller, it runs the slurmd service. A compute node is a machine which will receive jobs to execute, sent from the Controller, it runs the slurmd service.
Line 342: Line 341:
    
      
 +===== Modify user accounts =====
 +
 +Add user
 +
 +  sacctmgr add user <usernme> Account=gpu-users Partition=gpu
 +
 +Modify user, give 12000 minutes/200 hours for usage
 +
 +  sacctmgr modify user misegata set GrpTRESMin=cpu=12000,gres/gpu=12000
 +
 +Restart the services:
 +
 +  systemctl restart slurmctld.service
 +  systemctl restart slurmdbd.service
 +
 +Check status:
 +
 +  systemctl status slurmctld.service
 +  systemctl status slurmdbd.service
 +
 +
 +
  
  
 ===== Links ===== ===== Links =====
 +
 +[[https://slurm.schedmd.com/slurm_ug_2011/Basic_Configuration_Usage.pdf|Basic Configuration and Usage]]
  
 [[https://slurm.schedmd.com/overview.html|Slurm Workload Manager Overview]] [[https://slurm.schedmd.com/overview.html|Slurm Workload Manager Overview]]
Line 362: Line 385:
  
 ====== Modules ====== ====== Modules ======
 +
 +Add different python versions using spack!
 +
 +1. First see which python versions are available:
 +
 +  root@slurm-ctrl:~# spack versions python
 +  ==> Safe versions (already checksummed):
 +  3.8.2  3.7.7  3.7.4  3.7.1  3.6.7  3.6.4  3.6.1  3.5.2  3.4.10  3.2.6   2.7.17  2.7.14  2.7.11  2.7.8
 +  3.8.1  3.7.6  3.7.3  3.7.0  3.6.6  3.6.3  3.6.0  3.5.1  3.4.3   3.1.5   2.7.16  2.7.13  2.7.10
 +  3.8.0  3.7.5  3.7.2  3.6.8  3.6.5  3.6.2  3.5.7  3.5.0  3.3.6   2.7.18  2.7.15  2.7.12  2.7.9
 +==> Remote versions (not yet checksummed):
 +  3.10.0a6  3.8.7rc1  3.7.6rc1   3.6.8rc1   3.5.7rc1   3.4.9     3.4.0     3.1.2rc1   2.7.9rc1  2.6.6     2.4.5
 +  3.10.0a5  3.8.7 ....
 +  ...
 +  ...
 +
 +2. now select the python version you would like to install:
 +
 +  root@slurm-ctrl:~# spack install python@3.8.2
 +  ==> 23834: Installing libiconv
 +  ==> Using cached archive: /opt/packages/spack/var/spack/cache/_source-cache/archive/e6/e6a1b1b589654277ee790cce3734f07876ac4ccfaecbee8afa0b649cf529cc04.tar.gz
 +  ==> Staging archive: /tmp/root/spack-stage/spack-stage-libiconv-1.16-b2wenwxf2widzewcvnhsxtjyisz3bcmc/libiconv-1.16.tar.gz
 +  ==> Created stage in /tmp/root/spack-stage/spack-stage-libiconv-1.16-b2wenwxf2widzewcvnhsxtjyisz3bcmc
 +  ==> No patches needed for libiconv
 +  ==> 23834: libiconv: Building libiconv [AutotoolsPackage]
 +  ==> 23834: libiconv: Executing phase: 'autoreconf'
 +  ==> 23834: libiconv: Executing phase: 'configure'
 +  ==> 23834: libiconv: Executing phase: 'build'
 +  ==> 23834: libiconv: Executing phase: 'install'
 +  ==> 23834: libiconv: Successfully installed libiconv
 +  Fetch: 0.04s.  Build: 24.36s.  Total: 24.40s.
 +  [+] /opt/packages/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-9.3.0/libiconv-1.16-b2wenwxf2widzewcvnhsxtjyisz3bcmc
 +  ==> 23834: Installing libbsd
 +  ...
 +  ...
 +  ...
 +  ==> 23834: Installing python
 +  ==> Fetching https://www.python.org/ftp/python/3.8.2/Python-3.8.2.tgz
 +  ############################################################################################################ 100.0%
 +  ==> Staging archive: /tmp/root/spack-stage/spack-stage-python-3.8.2-vmyztzplzddt2arrsx7d7koebyuzvk6s/Python-3.8.2.tgz
 +  ==> Created stage in /tmp/root/spack-stage/spack-stage-python-3.8.2-vmyztzplzddt2arrsx7d7koebyuzvk6s
 +  ==> Ran patch() for python
 +  ==> 23834: python: Building python [AutotoolsPackage]
 +  ==> 23834: python: Executing phase: 'autoreconf'
 +  ==> 23834: python: Executing phase: 'configure'
 +  ==> 23834: python: Executing phase: 'build'
 +  ==> 23834: python: Executing phase: 'install'
 +  ==> 23834: python: Successfully installed python
 +  Fetch: 1.81s.  Build: 1m 42.11s.  Total: 1m 43.91s.
 +  [+] /opt/packages/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-9.3.0/python-3.8.2-vmyztzplzddt2arrsx7d7koebyuzvk6s
 +
 +
 +This will take some minutes time, depending on the type of version
 +
 +
 +3. Now you need to add a modules file
 +
 +  root@slurm-ctrl:~# vi /opt/modules/modulefiles/python-3.8.2
 +
 +<code>
 +#%Module1.0
 +proc ModulesHelp { } {
 +global dotversion
 +  
 +puts stderr "\tPython 3.8.2"
 +}
 +
 +module-whatis "Python 3.8.2"
 +
 +set     main_root       /opt/packages/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-9.3.0/python-3.8.2-vmyztzplzddt2arrsx7d7koebyuzvk6s
 +set-alias       python3.8       /opt/packages/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-9.3.0/python-3.8.2-vmyztzplzddt2arrsx7d7koebyuzvk6s/bin/python3.8
 +
 +prepend-path    PATH    $main_root/bin
 +prepend-path    LIBRARY_PATH    $main_root/lib
 +
 +</code>
 +
 +4. New module should now be available:
 +
 +  root@slurm-ctrl:~# module avail
 +  -------------------------------------------- /opt/modules/modulefiles -----------------------------------------
 +  anaconda3  cuda-11.2.1  intel-mpi             module-info  py-mpi4py      python-3.7.7       use.own
 +  bzip       dot          intel-mpi-benchmarks  modules      python-2.7.18  python-3.8.2
 +  cuda-10.2  gcc-6.5.0    miniconda3            null         python-3.5.7   python-3.9.2
 +  cuda-11.0  go-1.15.3    module-git            openmpi      python-3.6.10  singularity-3.6.4
 +
 +5. Load the new module
 +
 +  root@slurm-ctrl:~# module load python-3.8.2
 +
 +6. Verify it works
 +
 +  root@slurm-ctrl:~# python3.8
 +  Python 3.8.2 (default, Mar 19 2021, 11:05:37)
 +  [GCC 9.3.0] on linux
 +  Type "help", "copyright", "credits" or "license" for more information.
 +  >>> exit()
 +
 +7. Unload the new module
 +
 +  module unload python-3.8.2
 +
  
 ===== Python ===== ===== Python =====
Line 391: Line 516:
 ==== Create modules file ==== ==== Create modules file ====
  
 +**PYTHON**
  
   cd /opt/modules/modulefiles/   cd /opt/modules/modulefiles/
Line 407: Line 533:
  
 </code> </code>
-   
  
 +**CUDA**
  
 +  vi /opt/modules/modulefiles/cuda-10.2
 +
 +<code>
 +#%Module1.0
 +proc ModulesHelp { } {
 +global dotversion
 +
 +puts stderr "\tcuda-10.2"
 +}
 +
 +module-whatis "cuda-10.2"
 +
 +set     prefix  /usr/local/cuda-10.2
 +
 +setenv          CUDA_HOME       $prefix
 +prepend-path    PATH            $prefix/bin
 +prepend-path    LD_LIBRARY_PATH $prefix/lib64
 +</code>
  
 ===== GCC ===== ===== GCC =====
Line 741: Line 885:
  
 ===== CUDA NVIDIA TESLA Infos ===== ===== CUDA NVIDIA TESLA Infos =====
 +
 +=== nvidia-smi ===
 +
 +
 +  root@gpu02:~# watch nvidia-smi
 +
 +<code>
 +Every 2.0s: nvidia-smi                                           gpu02: Mon Jun 22 17:49:14 2020
 +
 +Mon Jun 22 17:49:14 2020
 ++-----------------------------------------------------------------------------+
 +| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
 +|-------------------------------+----------------------+----------------------+
 +| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 +| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 +|===============================+======================+======================|
 +|    Tesla V100-PCIE...  On   | 00000000:3B:00.0 Off |                    0 |
 +| N/A   53C    P0   139W / 250W |  31385MiB / 32510MiB |     69%      Default |
 ++-------------------------------+----------------------+----------------------+
 +|    Tesla V100-PCIE...  On   | 00000000:AF:00.0 Off |                    0 |
 +| N/A   35C    P0    26W / 250W |      0MiB / 32510MiB |      0%      Default |
 ++-------------------------------+----------------------+----------------------+
 +
 ++-----------------------------------------------------------------------------+
 +| Processes:                                                       GPU Memory |
 +|  GPU       PID   Type   Process name                             Usage      |
 +|=============================================================================|
 +|    0      8627      C   /opt/anaconda3/bin/python3                 31373MiB |
 ++-----------------------------------------------------------------------------+
 +
 +</code>
 +
 +=== deviceQuery ===
 +
  
 To run the deviceQuery it is necessary to make it first! To run the deviceQuery it is necessary to make it first!
/data/www/wiki.inf.unibz.it/data/pages/tech/slurm.txt · Last modified: 2022/11/24 16:17 by kohofer