User Tools

Site Tools


tech:slurm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
tech:slurm [2020/04/24 11:41] kohofertech:slurm [2020/05/27 10:57] kohofer
Line 241: Line 241:
   debug*       up   infinite      1   idle linux1   debug*       up   infinite      1   idle linux1
  
-If computer node is down+If computer node is **<color #ed1c24>down</color>** or **<color #ed1c24>drain</color>**
  
 <code> <code>
Line 247: Line 247:
 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 debug*       up   infinite      2   down gpu[02-03] debug*       up   infinite      2   down gpu[02-03]
 +
 +sinfo 
 +PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
 +gpu*         up   infinite      1  drain gpu02
 +gpu*         up   infinite      1   down gpu03
 +
 </code> </code>
  
Line 356: Line 362:
  
 ====== Modules ====== ====== Modules ======
 +
 +===== Python =====
 +
 +==== Python 3.7.7 ====
 +
 +
 +  cd /opt/packages
 +  mkdir /opt/packages/python/3.7.7
 +  wget https://www.python.org/ftp/python/3.7.7/Python-3.7.7.tar.xz
 +  tar xfJ Python-3.7.7.tar.xz
 +  cd Python-3.7.7/
 +  ./configure --prefix=/opt/packages/python/3.7.7/ --enable-optimizations
 +  make
 +  make install
 +  
 +
 +==== Python 2.7.18 ====
 +
 +
 +  cd /opt/packages
 +  mkdir /opt/packages/python/2.7.18
 +  wget https://www.python.org/ftp/python/2.7.18/Python-2.7.18.tar.xz
 +  cd Python-2.7.18
 +  ./configure --prefix=/opt/packages/python/2.7.18/ --enable-optimizations
 +  make
 +  make install
 +
 +==== Create modules file ====
 +
 +
 +  cd /opt/modules/modulefiles/
 +  vi python-2.7.18
 +
 +<code>
 +#%Module1.0
 +proc ModulesHelp { } {
 +global dotversion
 + 
 +puts stderr "\tPython 2.7.18"
 +}
 + 
 +module-whatis "Python 2.7.18"
 +prepend-path PATH /opt/packages/python/2.7.18/bin
 +
 +</code>
 +  
 +
 +
  
 ===== GCC ===== ===== GCC =====
 +
 +This takes a long time!
  
 Commands to run to compile gcc-6.1.0 Commands to run to compile gcc-6.1.0
Line 368: Line 424:
   make   make
  
 +After some time an error occurs, and the make process stops!
 +<code>
 +...
 In file included from ../.././libgcc/unwind-dw2.c:401:0: In file included from ../.././libgcc/unwind-dw2.c:401:0:
 ./md-unwind-support.h: In function ‘x86_64_fallback_frame_state’: ./md-unwind-support.h: In function ‘x86_64_fallback_frame_state’:
Line 374: Line 433:
                                                ^~                                                ^~
 ../.././libgcc/shared-object.mk:14: recipe for target 'unwind-dw2.o' failed ../.././libgcc/shared-object.mk:14: recipe for target 'unwind-dw2.o' failed
 +</code>
  
-To fix do: +To fix do: [[https://stackoverflow.com/questions/46999900/how-to-compile-gcc-6-4-0-with-gcc-7-2-in-archlinux|solution]]
-https://stackoverflow.com/questions/46999900/how-to-compile-gcc-6-4-0-with-gcc-7-2-in-archlinux+
  
-vi /opt/packages/gcc-6.1.0/x86_64-pc-linux-gnu/libgcc/md-unwind-support.h+  vi /opt/packages/gcc-6.1.0/x86_64-pc-linux-gnu/libgcc/md-unwind-support.h
  
-and replace line 61 with this:+and replace/comment out line 61 with this:
  
 +<code>
 struct ucontext_t *uc_ = context->cfa; struct ucontext_t *uc_ = context->cfa;
 +</code>
  
-or comment the old line: /* struct ucontext *uc_ = context->cfa; */+old line: /* struct ucontext *uc_ = context->cfa; */
  
-run make again+  make 
 + 
 +Next error: 
 + 
 +<code> 
 +../../.././libsanitizer/sanitizer_common/sanitizer_stoptheworld_linux_libcdep.cc:270:22: error: aggregate ‘sigaltstack handler_stack’ has incomplete type and cannot be defined 
 +   struct sigaltstack handler_stack; 
 + 
 +</code> 
 + 
 +To fix see: [[https://github.com/llvm-mirror/compiler-rt/commit/8a5e425a68de4d2c80ff00a97bbcb3722a4716da?diff=unified|solution]] 
 +or [[https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81066]] 
 + 
 +Amend the files according to solution above! 
 + 
 +Next error: 
 + 
 +<code> 
 +... 
 +checking for unzip... unzip 
 +configure: error: cannot find neither zip nor jar, cannot continue 
 +Makefile:23048: recipe for target 'configure-target-libjava' failed 
 +... 
 +... 
 +</code> 
 + 
 +  apt install unzip zip 
 + 
 +and run make again
 + 
 +  make 
 + 
 +Next error: 
 + 
 +<code> 
 +... 
 +In file included from ../.././libjava/prims.cc:26:0: 
 +../.././libjava/prims.cc: In function ‘void _Jv_catch_fpe(int, siginfo_t*, void*)’: 
 +./include/java-signal.h:32:26: error: invalid use of incomplete type ‘struct _Jv_catch_fpe(int, siginfo_t*, void*)::ucontext’ 
 +   gregset_t &_gregs = _uc->uc_mcontext.gregs;    \ 
 +... 
 +</code> 
 + 
 +Edit the file: /opt/packages/gcc-6.1.0/x86_64-pc-linux-gnu/libjava/include/java-signal.h 
 + 
 +  vi /opt/packages/gcc-6.1.0/x86_64-pc-linux-gnu/libjava/include/java-signal.h 
 + 
 +<note warning>Not enough more errors!</note> 
 + 
 +<code> 
 +// kh 
 +  ucontext_t *_uc = (ucontext_t *);                             \ 
 +  //struct ucontext *_uc = (struct ucontext *)_p;                               \ 
 +  // kh 
 + 
 +</code> 
 + 
 +Next error: 
 + 
 +<code php> 
 +... 
 +In file included from ../.././libjava/prims.cc:26:0:           
 +./include/java-signal.h:32:3: warning: multi-line comment [-Wcomment] 
 +   //struct ucontext *_uc = (struct ucontext *)_p;                                                   
 +                                                         
 +../.././libjava/prims.cc: In function ‘void _Jv_catch_fpe(int, siginfo_t*, void*)’: 
 +./include/java-signal.h:31:15: warning: unused variable ‘_uc’ [-Wunused-variable]                
 +   ucontext_t *_uc = (ucontext_t *)_p;        
 +                        
 +../.././libjava/prims.cc:192:3: note: in expansion of macro ‘HANDLE_DIVIDE_OVERFLOW’             
 +   HANDLE_DIVIDE_OVERFLOW;        
 +   ^~~~~~~~~~~~~~~~~~~~~~ 
 +../.././libjava/prims.cc:203:1: error: expected ‘while’ before ‘jboolean’                     
 + jboolean                                        
 + ^~~~~~~~                                       
 +../.././libjava/prims.cc:203:1: error: expected ‘(’ before ‘jboolean’ 
 +../.././libjava/prims.cc:204:1: error: expected primary-expression before ‘_Jv_equalUtf8Consts’ 
 + _Jv_equalUtf8Consts (const Utf8Const* a, const Utf8Const *b)                    
 + ^~~~~~~~~~~~~~~~~~~                                     
 +../.././libjava/prims.cc:204:1: error: expected ‘)’ before ‘_Jv_equalUtf8Consts’ 
 +../.././libjava/prims.cc:204:1: error: expected ‘;’ before ‘_Jv_equalUtf8Consts’ 
 +../.././libjava/prims.cc:204:22: error: expected primary-expression before ‘const’ 
 + _Jv_equalUtf8Consts (const Utf8Const* a, const Utf8Const *b) 
 +... 
 +</code> 
 + 
 +===== Example ===== 
 + 
 +An simple example to use nvidia GPU! 
 + 
 +<code> 
 +#!/bin/bash 
 + 
 +#SBATCH --job-name=mnist 
 +#SBATCH --output=mnist.out 
 +#SBATCH --error=mnist.err 
 + 
 +#SBATCH --partition gpu 
 +#SBATCH --gres=gpu 
 +#SBATCH --mem-per-cpu=4gb 
 +#SBATCH --nodes 2 
 +#SBATCH --time=00:08:00 
 + 
 +#SBATCH --ntasks=10 
 + 
 +#SBATCH --mail-type=ALL 
 +#SBATCH --mail-user=<your-email@address.com> 
 +</code> 
 + 
 + 
 + 
 + 
 +ml load miniconda3 
 + 
 +python3 main.py
  
  
  
 ===== Links ===== ===== Links =====
 +
 +https://www.admin-magazine.com/HPC/Articles/Warewulf-Cluster-Manager-Development-and-Run-Time/Warewulf-3-Code/MPICH2
 +
 +https://proteusmaster.urcf.drexel.edu/urcfwiki/index.php/Environment_Modules_Quick_Start_Guide
 +
 +https://en.wikipedia.org/wiki/Environment_Modules_(software)
  
 http://www.walkingrandomly.com/?p=5680 http://www.walkingrandomly.com/?p=5680
  
 https://modules.readthedocs.io/en/latest/index.html https://modules.readthedocs.io/en/latest/index.html
 +
/data/www/wiki.inf.unibz.it/data/pages/tech/slurm.txt · Last modified: 2022/11/24 16:17 by kohofer