User Tools

Site Tools


tech:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
tech:slurm [2019/09/06 14:59] – [Controller] kohofertech:slurm [2019/09/06 17:03] – [Links] kohofer
Line 51: Line 51:
 === Central Controller === === Central Controller ===
  
-The main configuration file is /etc/slurm-llnl/slurm.conf this file has to be present in the controller and all of the compute nodes and it also has to be consistent between all of them.+The main configuration file is /etc/slurm-llnl/slurm.conf this file has to be present in the controller and *ALL* of the compute nodes and it also has to be consistent between all of them.
  
-  vi /etc/slurm-llnl/slurm.conf +  vi /etc/slurm-llnl/slurm.conf
  
 <code> <code>
Line 114: Line 114:
 </code> </code>
  
-  root@controller# systemctl start slurmctld+  root@slurm-ctrl# scp /etc/slurm-llnl/slurm.conf csadmin@10.7.20.98:/tmp/.; scp /etc/slurm-llnl/slurm.conf csadmin@10.7.20.102:/tmp/
 +  root@slurm-ctrl# systemctl start slurmctld
  
 === Accounting Storage === === Accounting Storage ===
Line 153: Line 154:
 </code> </code>
  
-  root@controller# systemctl start slurmdbd+  root@slurm-ctrl# systemctl start slurmdbd 
 + 
 +=== Authentication === 
 + 
 +Copy /etc/munge.key to all compute nodes 
 + 
 +  scp /etc/munge/munge.key csadmin@10.7.20.98:/tmp/
 +  
 +Allow password-less access to slurm-ctrl 
 +  
 +  csadmin@slurm-ctrl:~$ ssh-copy-id -i .ssh/id_rsa.pub 10.7.20.102: 
 +   
 +Run a job from slurm-ctrl 
 + 
 +  ssh csadmin@slurm-ctrl 
 +  srun -N 1 hostname 
 +  linux1
  
-=== Configure munge === 
  
-  ssh csadmin@linux1; sudo -i 
-  scp slurm-ctrl:/etc/munge/munge.key /etc/munge/ 
  
 === Test munge === === Test munge ===
Line 179: Line 193:
 {{:tech:slurm-hpc-cluster_compute-node.png?400|}} {{:tech:slurm-hpc-cluster_compute-node.png?400|}}
  
-=== Authentication ===+=== Installation ===
  
-  ssh root@slurm-ctrl +  ssh -l csadmin 10.7.20.102 
-  root@controller# for i in `seq 1 2`; do scp /etc/munge/munge.key linux-${i}:/etc/munge/munge.key; done+  sudo apt install slurm-wlm 
 +  
 +Generate ssh keys
  
-  root@compute-1# systemctl start munge+  ssh-keygen
  
-Run a job from slurm-ctrl+Copy ssh-keys to slurm-ctrl (using IP, because no DNS in place)
  
-  ssh csadmin +  ssh-copy-id -i ~/.ssh/id_rsa.pub csadmin@10.7.20.97: 
-  srun -N 1 hostname + 
-  linux1+Become root to do important things: 
 + 
 +  sudo -i 
 +  vi /etc/hosts 
 + 
 +Add those lines below to the /etc/hosts file 
 + 
 +<code> 
 +10.7.20.97      slurm-ctrl.inf.unibz.it slurm-ctrl 
 +10.7.20.98      linux1.inf.unibz.it     linux1 
 +</code> 
 + 
 +First copy the munge keys from the slurm-ctrl to all compute nodes, now fix location, 
 +owner and permission. 
 + 
 +  mv /tmp/munge.key /etc/munge/
 +  chown munge:munge /etc/munge/munge.key 
 +  chmod 400 /etc/munge/munge.key 
 + 
 +Place /etc/slurm-llnl/slurm.conf in right place, 
 + 
 +  mv /tmp/slurm.conf /etc/slurm-llnl/ 
 +  chown root: /etc/slurm-llnl/slurm.conf 
 +  
 +   
 + 
 + 
 +===== Links ===== 
 + 
 +[[https://slurm.schedmd.com/overview.html|Slurm Workload Manager Overview]]
  
 +[[https://github.com/mknoxnv/ubuntu-slurm|Steps to create a small slurm cluster with GPU enabled nodes]]
  
 +[[https://implement.pt/2018/09/slurm-in-ubuntu-clusters-pt1/|Slurm in Ubuntu Clusters Part1]]
  
 +[[https://wiki.fysik.dtu.dk/niflheim/SLURM|Slurm batch queueing system]]
  
 +[[https://doku.lrz.de/display/PUBLIC/SLURM+Workload+Manager|SLURM Workload Manager]]
  
 +[[https://support.ceci-hpc.be/doc/_contents/QuickStart/SubmittingJobs/SlurmTutorial.html|Slurm Quick Start Tutorial]]
  
-https://slurm.schedmd.com/overview.html+{{ :tech:9-slurm.pdf |Linux Clusters Institute: Scheduling and Resource Management 2017}}
/data/www/wiki.inf.unibz.it/data/pages/tech/slurm.txt · Last modified: 2022/11/24 16:17 by kohofer