tech:slurm
Differences
This shows you the differences between two versions of the page.
Next revisionBoth sides next revision | |||
tech:slurm [2019/09/06 11:17] – created kohofer | tech:slurm [2019/09/06 14:30] – kohofer | ||
---|---|---|---|
Line 13: | Line 13: | ||
{{: | {{: | ||
+ | ===== Installation ===== | ||
+ | |||
+ | ==== Controller ==== | ||
+ | |||
+ | Controller name: slurm-ctrl | ||
+ | |||
+ | $ ssh csadmin@slurm-ctrl | ||
+ | $ sudo apt install slurm-wlm slurm-wlm-doc mailutils sview mariadb-client mariadb-server libmariadb-dev python-dev python-mysqldb | ||
+ | |||
+ | === Install Maria DB Server === | ||
+ | |||
+ | $ apt-get install mariadb-server | ||
+ | $ systemctl start mysql | ||
+ | $ mysql -u root | ||
+ | create database slurm_acct_db; | ||
+ | create user ' | ||
+ | set password for ' | ||
+ | grant usage on *.* to ' | ||
+ | grant all privileges on slurm_acct_db.* to ' | ||
+ | flush privileges; | ||
+ | exit | ||
+ | |||
+ | In the file / | ||
+ | |||
+ | bind-address = localhost | ||
+ | |||
+ | === Configure munge === | ||
+ | |||
+ | $ ssh csadmin@linux1 | ||
+ | scp slurm-ctrl:/ | ||
+ | |||
+ | === Node Authentication === | ||
+ | |||
+ | First, let us configure the default options for the munge service: | ||
+ | |||
+ | / | ||
+ | |||
+ | OPTIONS=" | ||
+ | |||
+ | === Central Controller === | ||
+ | |||
+ | The main configuration file is / | ||
+ | |||
+ | < | ||
+ | ############################### | ||
+ | # / | ||
+ | ############################### | ||
+ | # General | ||
+ | ControlMachine=entry-node | ||
+ | AuthType=auth/ | ||
+ | CacheGroups=0 | ||
+ | CryptoType=crypto/ | ||
+ | JobCheckpointDir=/ | ||
+ | KillOnBadExit=01 | ||
+ | MpiDefault=pmi2 | ||
+ | MailProg=/ | ||
+ | PrivateData=usage, | ||
+ | ProctrackType=proctrack/ | ||
+ | PrologFlags=Alloc, | ||
+ | PropagateResourceLimits=NONE | ||
+ | RebootProgram=/ | ||
+ | ReturnToService=1 | ||
+ | SlurmctldPidFile=/ | ||
+ | SlurmctldPort=6817 | ||
+ | SlurmdPidFile=/ | ||
+ | SlurmdPort=6818 | ||
+ | SlurmdSpoolDir=/ | ||
+ | SlurmUser=slurm | ||
+ | StateSaveLocation=/ | ||
+ | SwitchType=switch/ | ||
+ | TaskPlugin=task/ | ||
+ | |||
+ | # Timers | ||
+ | InactiveLimit=0 | ||
+ | KillWait=30 | ||
+ | MinJobAge=300 | ||
+ | SlurmctldTimeout=120 | ||
+ | SlurmdTimeout=300 | ||
+ | Waittime=0 | ||
+ | |||
+ | # Scheduler | ||
+ | FastSchedule=1 | ||
+ | SchedulerType=sched/ | ||
+ | SchedulerPort=7321 | ||
+ | SelectType=select/ | ||
+ | SelectTypeParameters=CR_CPU_Memory | ||
+ | |||
+ | # Preemptions | ||
+ | PreemptType=preempt/ | ||
+ | PreemptMode=REQUEUE | ||
+ | |||
+ | # Accounting | ||
+ | AccountingStorageType=accounting_storage/ | ||
+ | AccountingStoreJobComment=YES | ||
+ | ClusterName=mycluster | ||
+ | JobAcctGatherFrequency=30 | ||
+ | JobAcctGatherType=jobacct_gather/ | ||
+ | SlurmctldDebug=3 | ||
+ | SlurmctldLogFile=/ | ||
+ | SlurmdDebug=3 | ||
+ | SlurmdLogFile=/ | ||
+ | SlurmSchedLogFile= / | ||
+ | SlurmSchedLogLevel=3 | ||
+ | |||
+ | NodeName=compute-1 Procs=48 Sockets=4 CoresPerSocket=12 ThreadsPerCore=1 RealMemory=128000 Weight=4 | ||
+ | NodeName=compute-2 Procs=48 Sockets=4 CoresPerSocket=12 ThreadsPerCore=1 RealMemory=254000 Weight=3 | ||
+ | NodeName=compute-3 Procs=96 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=256000 Weight=3 | ||
+ | NodeName=compute-4 Procs=96 Sockets=2 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=256000 Weight=3 | ||
+ | |||
+ | PartitionName=base Nodes=compute-1, | ||
+ | PartitionName=long Nodes=compute-1, | ||
+ | </ | ||
+ | |||
+ | root@controller# | ||
+ | |||
+ | === Accounting Storage === | ||
+ | |||
+ | After we have the slurm-llnl-slurmdbd package installed we configure it, by editing the / | ||
+ | |||
+ | < | ||
+ | ######################################################################## | ||
+ | # | ||
+ | # / | ||
+ | # Database Daemon (SlurmDBD) configuration information. | ||
+ | # The contents of the file are case insensitive except for the names of | ||
+ | # nodes and files. Any text following a "#" | ||
+ | # line in the file is limited to 1024 characters. Changes to the | ||
+ | # configuration file take effect upon restart of SlurmDbd or daemon | ||
+ | # receipt of the SIGHUP signal unless otherwise noted. | ||
+ | # | ||
+ | # This file should be only on the computer where SlurmDBD executes and | ||
+ | # should only be readable by the user which executes SlurmDBD (e.g. | ||
+ | # " | ||
+ | # it contains a database password. | ||
+ | ######################################################################### | ||
+ | AuthType=auth/ | ||
+ | AuthInfo=/ | ||
+ | StorageHost=localhost | ||
+ | StoragePort=3306 | ||
+ | StorageUser=slurm | ||
+ | StoragePass=safepassword | ||
+ | StorageType=accounting_storage/ | ||
+ | StorageLoc=slurm_acct_db | ||
+ | LogFile=/ | ||
+ | PidFile=/ | ||
+ | SlurmUser=slurm | ||
+ | </ | ||
+ | |||
+ | root@controller# | ||
+ | |||
+ | |||
+ | === Test munge === | ||
+ | |||
+ | $ munge -n | unmunge | grep STATUS | ||
+ | STATUS: | ||
+ | $ munge -n | ssh slurm-ctrl unmunge | grep STATUS | ||
+ | STATUS: | ||
+ | |||
+ | === Test Slurm === | ||
+ | |||
+ | $ sinfo | ||
+ | PARTITION AVAIL TIMELIMIT | ||
+ | debug* | ||
+ | |||
+ | ==== Compute Nodes ==== | ||
+ | |||
+ | A compute node is a machine which will receive jobs to execute, sent from the Controller, it runs the slurmd service. | ||
+ | |||
+ | Zecihnung | ||
+ | |||
+ | === Authentication === | ||
+ | |||
+ | $ ssh root@slurm-ctrl | ||
+ | root@controller# | ||
+ | |||
+ | root@compute-1# | ||
+ | |||
+ | Run a job from slurm-ctrl | ||
+ | |||
+ | $ ssh csadmin | ||
+ | $ srun -N 1 hostname | ||
+ | linux1 | ||
/data/www/wiki.inf.unibz.it/data/pages/tech/slurm.txt · Last modified: 2022/11/24 16:17 by kohofer