User Tools

Site Tools


tech:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tech:slurm [2021/03/19 12:28] – [Modules] kohofertech:slurm [2022/11/24 16:17] (current) – [Compute Nodes] kohofer
Line 14: Line 14:
  
 ===== Installation ===== ===== Installation =====
 +
  
 ===== Controller name: slurm-ctrl ===== ===== Controller name: slurm-ctrl =====
Line 241: Line 242:
   debug*       up   infinite      1   idle linux1   debug*       up   infinite      1   idle linux1
  
-If computer node is **<color #ed1c24>down</color>** or **<color #ed1c24>drain</color>**+If computer node is <color #ed1c24>down</color> or <color #ed1c24>drain</color>
  
 <code> <code>
Line 263: Line 264:
 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 debug*       up   infinite      2   idle gpu[02-03] debug*       up   infinite      2   idle gpu[02-03]
 +</code>
 +
 +<code>
 +sinfo -o "%20N  %10c  %10m  %25f  %10G "
 +NODELIST              CPUS        MEMORY      AVAIL_FEATURES             GRES       
 +gpu[02-03]            32          190000      (null)                     gpu:     
 +gpu04                 64          1000000     (null)                     gpu:4(S:0) 
 +hpcmoi01,hpcwrk01     32+         190000+     (null)                     (null)
 </code> </code>
  
Line 340: Line 349:
   chown root: /etc/slurm-llnl/slurm.conf   chown root: /etc/slurm-llnl/slurm.conf
    
-  + 
 +=== Directories === 
 + 
 +Be sure that the nfs mounted partitions are, all there: 
 + 
 +<code> 
 +/data 
 +/opt/packages 
 +/home/clusterusers 
 +/opt/modules 
 +/scratch 
 +</code> 
 ===== Modify user accounts ===== ===== Modify user accounts =====
 +
 +Display the accounts created:
 +
 +  # Show also associations in the accounts
 +  sacctmgr show account -s
 +  # Show all columns separated by pipe | symbol
 +  sacctmgr show account -s -P
 +  # 
 +  sacctmgr show user -s
  
 Add user Add user
  
-  sacctmgr add user <usernme> Account=gpu-users Partition=gpu+  sacctmgr add user <username> Account=gpu-users Partition=gpu
  
 Modify user, give 12000 minutes/200 hours for usage Modify user, give 12000 minutes/200 hours for usage
  
-  sacctmgr modify user misegata set GrpTRESMin=cpu=12000,gres/gpu=12000+  sacctmgr modify user <username> set GrpTRESMin=cpu=12000,gres/gpu=12000 
 + 
 +Modify user by removing it from certain account 
 + 
 +  sacctmgr remove user where user=<username> and account=<account> 
 + 
 +Delete user 
 + 
 +  sacctmgr delete user ivmilan 
 +  Deleting users... 
 +  ivmilan 
 +  Would you like to commit changes? (You have 30 seconds to decide) 
 +  (N/y): y 
  
 Restart the services: Restart the services:
Line 361: Line 404:
   systemctl status slurmdbd.service   systemctl status slurmdbd.service
  
 +==== Submit a job to a specific node using Slurm's sbatch command ====
 +
 +To run a job on a specific Node use this option in the job script
 +
 +  #SBATCH --nodelist=gpu03
  
  
Line 385: Line 433:
  
 ====== Modules ====== ====== Modules ======
 +
 +The Environment Modules package provides for the dynamic modification of a user's environment via modulefiles.
 +
 +Installing Modules on Unix
 +
 +Login into slurm-ctrl and become root
 +
 +  ssh slurm-ctrl
 +  sudo -i
 +
 +Download modules
 +
 +  curl -LJO https://github.com/cea-hpc/modules/releases/download/v4.6.0/modules-4.6.0.tar.gz
 +  tar xfz modules-4.6.0.tar.gz
 +  cd modules-4.6.0
 +
 +
 +  $ ./configure --prefix=/opt/modules
 +  $ make
 +  $ make install
 +
 +
 +
 +https://modules.readthedocs.io/en/stable/index.html
 +
 +
 +----
 +
 +===== SPACK =====
 +
  
 Add different python versions using spack! Add different python versions using spack!
Line 1024: Line 1102:
  
 ===== Links ===== ===== Links =====
 +
 +[[https://developer.nvidia.com/cuda-toolkit|CUDA Toolkit]]
 +
 +[[https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html|NVIDIA CUDA Installation Guide for Linux]]
 +
  
 https://www.admin-magazine.com/HPC/Articles/Warewulf-Cluster-Manager-Development-and-Run-Time/Warewulf-3-Code/MPICH2 https://www.admin-magazine.com/HPC/Articles/Warewulf-Cluster-Manager-Development-and-Run-Time/Warewulf-3-Code/MPICH2
/data/www/wiki.inf.unibz.it/data/attic/tech/slurm.1616153308.txt.gz · Last modified: 2021/03/19 12:28 by kohofer