User Tools

Site Tools


tech:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
tech:slurm [2019/09/06 15:36] – [Installation] kohofertech:slurm [2020/02/07 10:34] – [Compute Nodes] kohofer
Line 22: Line 22:
  
   ssh slurm-ctrl   ssh slurm-ctrl
-  apt install slurm-wlm slurm-wlm-doc mailutils sview mariadb-client mariadb-server libmariadb-dev python-dev python-mysqldb+  apt install slurm-wlm slurm-wlm-doc mailutils mariadb-client mariadb-server libmariadb-dev python-dev python-mysqldb
  
 === Install Maria DB Server === === Install Maria DB Server ===
Line 51: Line 51:
 === Central Controller === === Central Controller ===
  
-The main configuration file is /etc/slurm-llnl/slurm.conf this file has to be present in the controller and all of the compute nodes and it also has to be consistent between all of them.+The main configuration file is /etc/slurm-llnl/slurm.conf this file has to be present in the controller and *ALL* of the compute nodes and it also has to be consistent between all of them.
  
-  vi /etc/slurm-llnl/slurm.conf +  vi /etc/slurm-llnl/slurm.conf
  
 <code> <code>
Line 114: Line 114:
 </code> </code>
  
-  root@controller# systemctl start slurmctld+  root@slurm-ctrl# scp /etc/slurm-llnl/slurm.conf csadmin@10.7.20.98:/tmp/.; scp /etc/slurm-llnl/slurm.conf csadmin@10.7.20.102:/tmp/
 +  root@slurm-ctrl# systemctl start slurmctld
  
 === Accounting Storage === === Accounting Storage ===
Line 153: Line 154:
 </code> </code>
  
-  root@controller# systemctl start slurmdbd+  root@slurm-ctrl# systemctl start slurmdbd
  
 === Authentication === === Authentication ===
Line 159: Line 160:
 Copy /etc/munge.key to all compute nodes Copy /etc/munge.key to all compute nodes
  
-  scp /etc/munge/munge.key csadmin@10.7.20.97:/tmp/.+  scp /etc/munge/munge.key csadmin@10.7.20.98:/tmp/. 
 +  
 +Allow password-less access to slurm-ctrl 
 +  
 +  csadmin@slurm-ctrl:~$ ssh-copy-id -i .ssh/id_rsa.pub 10.7.20.102:
      
 Run a job from slurm-ctrl Run a job from slurm-ctrl
Line 188: Line 193:
 {{:tech:slurm-hpc-cluster_compute-node.png?400|}} {{:tech:slurm-hpc-cluster_compute-node.png?400|}}
  
-==== Installation ====+=== Installation slurm and munge === 
 + 
 +  ssh -l csadmin <compute-nodes> 10.7.20.109 10.7.20.110 
 +  sudo apt install slurm-wlm libmunge-dev libmunge2 munge 
 +  sudo systemctl enable slurmd 
 +  sudo systemctl enable munge 
 +  sudo systemctl start slurmd 
 +  sudo systemctl start munge 
  
-  ssh -l csadmin 10.7.20.102 
-  sudo apt install slurm-wlm 
-  
 Generate ssh keys Generate ssh keys
 +
   ssh-keygen   ssh-keygen
  
Line 215: Line 226:
 owner and permission. owner and permission.
  
-  mv /tmp/munge.key /etc/.+  mv /tmp/munge.key /etc/munge/.
   chown munge:munge /etc/munge/munge.key   chown munge:munge /etc/munge/munge.key
   chmod 400 /etc/munge/munge.key   chmod 400 /etc/munge/munge.key
 +
 +Place /etc/slurm-llnl/slurm.conf in right place,
 +
 +  mv /tmp/slurm.conf /etc/slurm-llnl/
 +  chown root: /etc/slurm-llnl/slurm.conf
 + 
 +  
  
  
Line 231: Line 249:
  
 [[https://doku.lrz.de/display/PUBLIC/SLURM+Workload+Manager|SLURM Workload Manager]] [[https://doku.lrz.de/display/PUBLIC/SLURM+Workload+Manager|SLURM Workload Manager]]
 +
 +[[https://support.ceci-hpc.be/doc/_contents/QuickStart/SubmittingJobs/SlurmTutorial.html|Slurm Quick Start Tutorial]]
 +
 +{{ :tech:9-slurm.pdf |Linux Clusters Institute: Scheduling and Resource Management 2017}}
/data/www/wiki.inf.unibz.it/data/pages/tech/slurm.txt · Last modified: 2022/11/24 16:17 by kohofer