User Tools

Site Tools


tech:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
tech:slurm [2021/03/19 12:28] – [Modules] kohofertech:slurm [2022/09/09 17:01] – [Modules] kohofer
Line 14: Line 14:
  
 ===== Installation ===== ===== Installation =====
 +
  
 ===== Controller name: slurm-ctrl ===== ===== Controller name: slurm-ctrl =====
Line 241: Line 242:
   debug*       up   infinite      1   idle linux1   debug*       up   infinite      1   idle linux1
  
-If computer node is **<color #ed1c24>down</color>** or **<color #ed1c24>drain</color>**+If computer node is <color #ed1c24>down</color> or <color #ed1c24>drain</color>
  
 <code> <code>
Line 263: Line 264:
 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 debug*       up   infinite      2   idle gpu[02-03] debug*       up   infinite      2   idle gpu[02-03]
 +</code>
 +
 +<code>
 +sinfo -o "%20N  %10c  %10m  %25f  %10G "
 +NODELIST              CPUS        MEMORY      AVAIL_FEATURES             GRES       
 +gpu[02-03]            32          190000      (null)                     gpu:     
 +gpu04                 64          1000000     (null)                     gpu:4(S:0) 
 +hpcmoi01,hpcwrk01     32+         190000+     (null)                     (null)
 </code> </code>
  
Line 342: Line 351:
      
 ===== Modify user accounts ===== ===== Modify user accounts =====
 +
 +Display the accounts created:
 +
 +  # Show also associations in the accounts
 +  sacctmgr show account -s
 +  # Show all columns separated by pipe | symbol
 +  sacctmgr show account -s -P
 +  # 
 +  sacctmgr show user -s
  
 Add user Add user
  
-  sacctmgr add user <usernme> Account=gpu-users Partition=gpu+  sacctmgr add user <username> Account=gpu-users Partition=gpu
  
 Modify user, give 12000 minutes/200 hours for usage Modify user, give 12000 minutes/200 hours for usage
  
-  sacctmgr modify user misegata set GrpTRESMin=cpu=12000,gres/gpu=12000+  sacctmgr modify user <username> set GrpTRESMin=cpu=12000,gres/gpu=12000 
 + 
 +Modify user by removing it from certain account 
 + 
 +  sacctmgr remove user where user=<username> and account=<account> 
 + 
 +Delete user 
 + 
 +  sacctmgr delete user ivmilan 
 +  Deleting users... 
 +  ivmilan 
 +  Would you like to commit changes? (You have 30 seconds to decide) 
 +  (N/y): y 
  
 Restart the services: Restart the services:
Line 361: Line 392:
   systemctl status slurmdbd.service   systemctl status slurmdbd.service
  
 +==== Submit a job to a specific node using Slurm's sbatch command ====
 +
 +To run a job on a specific Node use this option in the job script
 +
 +  #SBATCH --nodelist=gpu03
  
  
Line 385: Line 421:
  
 ====== Modules ====== ====== Modules ======
 +
 +The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles.
 +
 +Installing Modules on Unix
 +
 +Login into slurm-ctrl and become root
 +
 +  ssh slurm-ctrl
 +  sudo -i
 +
 +Download modules
 +
 +  curl -LJO https://github.com/cea-hpc/modules/releases/download/v4.6.0/modules-4.6.0.tar.gz
 +  tar xfz modules-4.6.0.tar.gz
 +  cd modules-4.6.0
 +
 +
 +  $ ./configure --prefix=/opt/modules
 +  $ make
 +  $ make install
 +
 +
 +
 +https://modules.readthedocs.io/en/stable/index.html
 +
 +
 +----
 +
 +===== SPACK =====
 +
  
 Add different python versions using spack! Add different python versions using spack!
Line 1024: Line 1090:
  
 ===== Links ===== ===== Links =====
 +
 +[[https://developer.nvidia.com/cuda-toolkit|CUDA Toolkit]]
 +
 +[[https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html|NVIDIA CUDA Installation Guide for Linux]]
 +
  
 https://www.admin-magazine.com/HPC/Articles/Warewulf-Cluster-Manager-Development-and-Run-Time/Warewulf-3-Code/MPICH2 https://www.admin-magazine.com/HPC/Articles/Warewulf-Cluster-Manager-Development-and-Run-Time/Warewulf-3-Code/MPICH2
/data/www/wiki.inf.unibz.it/data/pages/tech/slurm.txt · Last modified: 2022/11/24 16:17 by kohofer