ICC Home Privacy and Legal Notice LC User Documents Banner

UCRL-WEB-201386

SLURM Reference Manual


User Impact

TOOLS:
The primary SLURM job-control tool is SRUN, which fills the general role of PRUN (on former Compaq machines) or POE (on IBM computers). Your choice of run mode ("batch" or interactive) and your allocation of resources with SRUN strongly affect your job's behavior on machines where SLURM manages parallel jobs. SLURM works collaboratively with POE on AIX machines where SLURM has replaced IBM's LoadLeveler. See the SCONTROL section below, for example, for an introduction to how this collaboration supports job checkpointing.

To monitor the status of SRUN-submitted jobs, use the SLURM utility called SQUEUE. To monitor the status of SLURM-managed compute nodes, use the complementary tool called SINFO. Both SQUEUE and SINFO have explanatory sections later in this manual, with usage examples.

On BlueGene/L only, SLURM provides an additional user tool called SMAP to reveal topographically how nodes are allocated among current jobs or partitions (because job geometry is unusually important on BG/L).

POLICIES:
Once your SLURM-managed batch job starts to run on some LC machine's compute nodes, you have been allowed to log in to those nodes to execute additional processes (usually ones for monitoring or interactively guiding the batch job). When the batch job ended, these extra login sessions and nonbatch processes were allowed to continue. If these tasks were CPU intensive, however, they sometimes caused problems for subsequent batch runs by other users of the same compute nodes.

Starting in August, 2007, on LC Linux (but not AIX) clusters where SLURM is the resource manager underlying LCRM or Moab, all of a user's processes on any compute node will terminate whenever a SLURM-managed batch job completes on that node. This guarantees that the next user's job will see no interference from stray processes that accompanied your job. If your login session or interactive process on a Linux compute node unexpectedly ends, your batch job to which that node was allocated has probably just completed.


Navigation Links: [ Document List ] [ HPC Home ] [ Previous ] [ Next ]