User Tools

Site Tools


tech:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revisionBoth sides next revision
tech:slurm [2019/09/06 14:45] kohofertech:slurm [2019/09/06 14:59] – [Controller] kohofer
Line 19: Line 19:
 Controller name: slurm-ctrl Controller name: slurm-ctrl
  
-  ssh csadmin@slurm-ctrl +Install slurm-wlm and tools 
-  sudo apt install slurm-wlm slurm-wlm-doc mailutils sview mariadb-client mariadb-server libmariadb-dev python-dev python-mysqldb+ 
 +  ssh slurm-ctrl 
 +  apt install slurm-wlm slurm-wlm-doc mailutils sview mariadb-client mariadb-server libmariadb-dev python-dev python-mysqldb
  
 === Install Maria DB Server === === Install Maria DB Server ===
Line 37: Line 39:
 In the file /etc/mysql/mariadb.conf.d/50-server.cnf we should have the following setting: In the file /etc/mysql/mariadb.conf.d/50-server.cnf we should have the following setting:
  
 +  vi /etc/mysql/mariadb.conf.d/50-server.cnf
   bind-address = localhost   bind-address = localhost
- 
-=== Configure munge === 
- 
-  ssh csadmin@linux1 
-  scp slurm-ctrl:/etc/munge/munge.key /etc/munge/ 
  
 === Node Authentication === === Node Authentication ===
Line 48: Line 46:
 First, let us configure the default options for the munge service: First, let us configure the default options for the munge service:
  
-/etc/default/munge+  vi /etc/default/munge 
- +  OPTIONS="--syslog --key-file /etc/munge/munge.key"
-OPTIONS="--syslog --key-file /etc/munge/munge.key"+
  
 === Central Controller === === Central Controller ===
  
 The main configuration file is /etc/slurm-llnl/slurm.conf this file has to be present in the controller and all of the compute nodes and it also has to be consistent between all of them. The main configuration file is /etc/slurm-llnl/slurm.conf this file has to be present in the controller and all of the compute nodes and it also has to be consistent between all of them.
 +
 +  vi /etc/slurm-llnl/slurm.conf 
  
 <code> <code>
Line 60: Line 59:
 # /etc/slurm-llnl/slurm.conf # /etc/slurm-llnl/slurm.conf
 ############################### ###############################
-General +slurm.conf file generated by configurator easy.html. 
-ControlMachine=entry-node +# Put this file on all nodes of your cluster. 
-AuthType=auth/munge +# See the slurm.conf man page for more information. 
-CacheGroups=0 +# 
-CryptoType=crypto/munge +ControlMachine=slurm-ctrl 
-JobCheckpointDir=/var/lib/slurm-llnl/checkpoint +#ControlAddr=10.7.20.97 
-KillOnBadExit=01 +# 
-MpiDefault=pmi2 +#MailProg=/bin/mail 
-MailProg=/usr/bin/mail +MpiDefault=none 
-PrivateData=usage,users,accounts +#MpiParams=ports=#-# 
-ProctrackType=proctrack/cgroup +ProctrackType=proctrack/pgid
-PrologFlags=Alloc,Contain +
-PropagateResourceLimits=NONE +
-RebootProgram=/sbin/reboot+
 ReturnToService=1 ReturnToService=1
 SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
-SlurmctldPort=6817+##SlurmctldPidFile=/var/run/slurmctld.pid 
 +#SlurmctldPort=6817
 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
-SlurmdPort=6818 +##SlurmdPidFile=/var/run/slurmd.pid 
-SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd+#SlurmdPort=6818 
 +SlurmdSpoolDir=/var/spool/slurmd
 SlurmUser=slurm SlurmUser=slurm
-StateSaveLocation=/var/lib/slurm-llnl/slurmctld+#SlurmdUser=root 
 +StateSaveLocation=/var/spool
 SwitchType=switch/none SwitchType=switch/none
-TaskPlugin=task/cgroup +TaskPlugin=task/none 
- +# 
-Timers +
-InactiveLimit=0 +# TIMERS 
-KillWait=30 +#KillWait=30 
-MinJobAge=300 +#MinJobAge=300 
-SlurmctldTimeout=120 +#SlurmctldTimeout=120 
-SlurmdTimeout=300 +#SlurmdTimeout=300 
-Waittime=0 +# 
- +# 
-Scheduler+SCHEDULING
 FastSchedule=1 FastSchedule=1
 SchedulerType=sched/backfill SchedulerType=sched/backfill
-SchedulerPort=7321 +SelectType=select/linear 
-SelectType=select/cons_res +#SelectTypeParameters= 
-SelectTypeParameters=CR_CPU_Memory +# 
- +
-Preemptions +LOGGING AND ACCOUNTING 
-PreemptType=preempt/partition_prio +AccountingStorageType=accounting_storage/none 
-PreemptMode=REQUEUE +ClusterName=cluster 
- +#JobAcctGatherFrequency=30 
-Accounting +JobAcctGatherType=jobacct_gather/none 
-AccountingStorageType=accounting_storage/slurmdbd +#SlurmctldDebug=3 
-AccountingStoreJobComment=YES +SlurmctldLogFile=/var/log/slurm-llnl/SlurmctldLogFile 
-ClusterName=mycluster +#SlurmdDebug=3 
-JobAcctGatherFrequency=30 +SlurmdLogFile=/var/log/slurm-llnl/SlurmLogFile 
-JobAcctGatherType=jobacct_gather/linux +# 
-SlurmctldDebug=3 +# 
-SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log +# COMPUTE NODES 
-SlurmdDebug=3 +NodeName=linux1 NodeAddr=10.7.20.98 CPUs=1 State=UNKNOWN
-SlurmdLogFile=/var/log/slurm-llnl/slurmd.log +
-SlurmSchedLogFile= /var/log/slurm-llnl/slurmschd.log +
-SlurmSchedLogLevel=3 +
- +
-NodeName=compute-1 Procs=48 Sockets=4 CoresPerSocket=12 ThreadsPerCore=1 RealMemory=128000 Weight=4 +
-NodeName=compute-2 Procs=48 Sockets=4 CoresPerSocket=12 ThreadsPerCore=1 RealMemory=254000 Weight=3 +
-NodeName=compute-3 Procs=96 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=256000 Weight=3 +
-NodeName=compute-4 Procs=96 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=256000 Weight=3 +
- +
-PartitionName=base Nodes=compute-1,compute-2,compute-3,compute-4 Default=Yes MaxTime=72:00:00 Priority=1 State=UP +
-PartitionName=long Nodes=compute-1,compute-2,compute-3,compute-4 Default=No MaxTime=UNLIMITED Priority=1 State=UP AllowGroups=long+
 </code> </code>
  
Line 130: Line 118:
 === Accounting Storage === === Accounting Storage ===
  
-After we have the slurm-llnl-slurmdbd package installed we configure it, by editing the /etc/slurm-llnl/slurmdb.conf file:+After we have the slurm-llnl-slurmdbd package installed we configure it, by editing the /etc/slurm-llnl/slurmdbd.conf file: 
 + 
 +  vi /etc/slurm-llnl/slurmdbd.conf
  
 <code> <code>
 ######################################################################## ########################################################################
 # #
-# /etc/slurm-llnl/slurmdb.conf is an ASCII file which describes Slurm+# /etc/slurm-llnl/slurmdbd.conf is an ASCII file which describes Slurm
 # Database Daemon (SlurmDBD) configuration information. # Database Daemon (SlurmDBD) configuration information.
 # The contents of the file are case insensitive except for the names of # The contents of the file are case insensitive except for the names of
-# nodes and files. Any text following a "#" in the configuration file is     # treated as a comment through the end of that line. The size of each+# nodes and files. Any text following a "#" in the configuration file is 
 +# treated as a comment through the end of that line. The size of each
 # line in the file is limited to 1024 characters. Changes to the # line in the file is limited to 1024 characters. Changes to the
 # configuration file take effect upon restart of SlurmDbd or daemon # configuration file take effect upon restart of SlurmDbd or daemon
Line 153: Line 144:
 StoragePort=3306 StoragePort=3306
 StorageUser=slurm StorageUser=slurm
-StoragePass=safepassword+StoragePass=slurmdbpass
 StorageType=accounting_storage/mysql StorageType=accounting_storage/mysql
 StorageLoc=slurm_acct_db StorageLoc=slurm_acct_db
Line 159: Line 150:
 PidFile=/var/run/slurm-llnl/slurmdbd.pid PidFile=/var/run/slurm-llnl/slurmdbd.pid
 SlurmUser=slurm SlurmUser=slurm
 +
 </code> </code>
  
   root@controller# systemctl start slurmdbd   root@controller# systemctl start slurmdbd
  
 +=== Configure munge ===
 +
 +  ssh csadmin@linux1; sudo -i
 +  scp slurm-ctrl:/etc/munge/munge.key /etc/munge/
  
 === Test munge === === Test munge ===
/data/www/wiki.inf.unibz.it/data/pages/tech/slurm.txt · Last modified: 2022/11/24 16:17 by kohofer