User Tools

Site Tools


tech:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
tech:slurm [2020/02/10 16:25] – [Controller] kohofertech:slurm [2020/02/11 10:41] kohofer
Line 14: Line 14:
  
 ===== Installation ===== ===== Installation =====
- 
-==== Controller ==== 
  
 ===== Controller name: slurm-ctrl ===== ===== Controller name: slurm-ctrl =====
Line 131: Line 129:
 EnvironmentFile=-/etc/default/slurmctld EnvironmentFile=-/etc/default/slurmctld
 ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS ExecStart=/usr/sbin/slurmctld $SLURMCTLD_OPTIONS
 +ExecStartPost=/bin/sleep 2
 ExecReload=/bin/kill -HUP $MAINPID ExecReload=/bin/kill -HUP $MAINPID
 PIDFile=/var/run/slurm-llnl/slurmctld.pid PIDFile=/var/run/slurm-llnl/slurmctld.pid
Line 152: Line 151:
 EnvironmentFile=-/etc/default/slurmd EnvironmentFile=-/etc/default/slurmd
 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS
 +ExecStartPost=/bin/sleep 2
 ExecReload=/bin/kill -HUP $MAINPID ExecReload=/bin/kill -HUP $MAINPID
 PIDFile=/var/run/slurm-llnl/slurmd.pid PIDFile=/var/run/slurm-llnl/slurmd.pid
Line 241: Line 241:
   debug*       up   infinite      1   idle linux1   debug*       up   infinite      1   idle linux1
  
-==== Compute Nodes ====+If computer node is down 
 + 
 +<code> 
 +sinfo -a 
 +PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST 
 +debug*       up   infinite      2   down gpu[02-03] 
 +</code> 
 + 
 +  scontrol update nodename=gpu02 state=idle 
 +  scontrol update nodename=gpu03 state=idle 
 + 
 +<code> 
 +sinfo -a 
 +PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST 
 +debug*       up   infinite      2   idle gpu[02-03] 
 +</code> 
 + 
 + 
 +===== Compute Nodes ====
  
 A compute node is a machine which will receive jobs to execute, sent from the Controller, it runs the slurmd service. A compute node is a machine which will receive jobs to execute, sent from the Controller, it runs the slurmd service.
/data/www/wiki.inf.unibz.it/data/pages/tech/slurm.txt · Last modified: 2022/11/24 16:17 by kohofer