ICC Home Privacy and Legal Notice LC User Documents Banner

UCRL-WEB-201386

SLURM Reference Manual


Node Management

On All Machines:

-k (lowercase, --no-kill)
avoids automatic termination if any node fails that has been allocated to this job. The job assumes responsibility for handling such node failures internally. (SLURM's default is to terminate a job if any of its allocated nodes fail.)
-K (uppercase, --kill-on-bad-exit)
(default) terminates a job if any task has a nonzero exit code.
-m dist (lowercase, --distribution=dist)
tells SLURM how to distribute tasks among nodes for this job, where the choices for dist are:
block
assigns tasks in order to each CPU on one node before assigning any to the next node. This is the default if the number of tasks exceeds the number of nodes requested.
cyclic
assigns tasks "round robin" across all allocated nodes (task1 goes to the first node, task2 goes to the second node, etc.). This is the default if the number of nodes requested equals or exceeds the number of tasks.
hostfile
assigns tasks to nodes in the order specified by the file named in the environment variable SLURM_HOSTFILE.
plane
assigns tasks to blocks, then within and between blocks too, as diagrammed at http://www.llnl.gov/linux/slurm/dist_plane.html.
-r n (lowercase, --relative=n)
offsets the first job step to node n of this job's allocated node set (where the first node is 0). Option -r is incompatible with "constraint" options -w and -x, and it is ignored when you run a job without a prior node allocation (default for n is 0). SRUN does not support job steps on BlueGene/L.
-s (lowercase, --share)
allows this job to share nodes with other running jobs. Sharing nodes often starts the job faster and boosts system utilization, but it can also lower application performance.

On BlueGene/L ONLY:

--blrts-image=path
specifies the path to the blrts image for the BG/L block (the default path is in the file blugene.conf).
--geometry=N[xM[xO]]
specifies your job's size in "nodes" in each direction within BG/L's field of nodes (e.g., geometry=1x2x4 for 8 nodes). SLURM regards each BG/L 512-node dual-processor "base partition" as a single 1024-processor node. Use SLURM's SMAP utility on BG/L to visualize job layout and the geometric intermixing of several jobs.

If you omit --geometry on BG/L, then SRUN uses 1x1x1 as the default (or if you also use -Nnum then SRUN uses numx1x1 as the default). If you omit O then the default geometry is NxMx1; if you omit both M and O then the default is Nx1x1.

--conn-type=mesh|torus
specifies the type of interconnect that you want used between BG/L "base partitions" ("nodes" to SLURM), where the choices are mesh (the default) or torus.
--linux-image=path
specifies the path to the linux image for the BG/L block (the default path is in the file blugene.conf).
--mloader-image=path
specifies the path to the mloader image for the BG/L block (the default path is in the file blugene.conf).
--node-use=coprocessor|virtual
specifies how to use the second processor on each BG/L compute node, where the choices are coprocessor (the default, so that the processor number is always t0) or virtual (allows processor numbers t0 and t1, but seems to be incompatible with the TotalView debugger).
-R (uppercase, --no-rotate)
disables rotation of job geometry to fit available space (the default is to rotate in three dimensions).
--ramdisk-image=path
specifies the path to the ramdisk image for the BG/L block (the default path is in the file blugene.conf).
--reboot
forces the allocated nodes to reboot before starting to run your job.



Navigation Links: [ Document List ] [ HPC Home ] [ Next ]