Slurm and Moab

Author: Blaise Barney, Lawrence Livermore National Laboratory UCRL-PRES-228819

Table of Contents

  1. Abstract
  2. What is a Workload Manager?
  3. Workload Managers at LC
  4. Basic Concepts
    1. Jobs
    2. Queues and Queue Limits
    3. Banks
    4. Fair Share Job Scheduling
  5. Basic Functions
    1. Building a Job Script
    2. Submitting Jobs
    3. Monitoring Jobs
    4. Job States and Status Codes
    5. Exercise 1
    6. Holding/Releasing Jobs
    7. Canceling Jobs
    8. Changing Job Parameters
    9. Setting Up Dependent Jobs
    10. Banks and Usage Information
    11. Output Files
    12. Guesstimating When Your Job Will Start
    13. Determining When Your Job's Time is About to Expire
    14. Running in Standby Mode
  6. Displaying Configuration and Accounting Information
  7. Parallel Jobs and the srun Command
  8. Running Multiple Jobs From a Single Job Script
  9. Running on Serial Clusters
  10. Batch Commands Summary
  11. Exercise 2
  12. References and More Information



Abstract


Slurm and Moab are two workload manager systems used to schedule and manage user jobs run on Livermore Computing (LC) clusters. This tutorial presents the essentials for using Slurm and Moab on LC platforms. It begins with an overview of workload managers, followed by a discussion on some basic concepts for workload managers, such as the definition of a job, queues and queue limits, banks and fair-share job scheduling. Basic workload manager functions are covered next, including how to build batch scripts, submit, monitor, change, hold/release, and cancel jobs. Dependent jobs, bank usage information, output files, determining when a job will expire, and running in standby round out the basic workload manager functions. Other topics covered include displaying configuration and accounting information, a discussion on parallel jobs and the srun command, and running on serial clusers. This tutorial includes both C and Fortran example codes and lab exercises.

Level/Prerequisites: The material covered in EC3501: Livermore Computing Resources and Environment would be helpful.





What is a Workload Manager?




Workload Managers at LC


Slurm

Spectrum LSF

Moab

LCRM

Confused?



Basic Concepts

Jobs

Simple Definition:

Slightly More Complex Definition



Basic Concepts

Queues and Queue Limits

Queues (also called Pools and/or Partitions):

How Do I Find Out What the Queue Limits Are?



Basic Concepts

Banks

Bank Hierarchy:

Bank Shares:



Basic Concepts

Fair Share Job Scheduling

Why in the World Won't My Job Run?
  • Undoubtedly, this is the most commonly asked batch system question.

  • Classic scenario: a user submits a job requesting 16 nodes when 50 nodes are shown as available/idle. However, the job sits in the queue and doesn't run. Why?

  • Aside from any "user error" related reasons, there are several other, sometimes complicated, reasons.

  • Probably the most important reason is the underlying mechanism used by the batch system to determine when/if a job should run.

  • At LC, the Workload Manager has been programmed to use a "Fair Share with Half-Life Decay of Usage" algorithm for determining a job's eligibility to run.
Fair Share with Half-Life Decay of Usage:
  • This is the primary mechanism used to determine job scheduling. It is based upon a dynamically calculated priority for your job that reflects your share allocation within a bank versus your actual usage.
    • Use more than your share, your priority/service degrades
    • Use less than your share, your priority/service improves
    • Your priority can become very low, but you never "run out of time" at LC.

  • Jobs with higher priorities often need to acquire their full set of nodes over time. While their nodes are being reserved, the nodes will appear to be idle.

  • Half-Life Decay: Without new usage, your current usage value decays to half its value in two weeks.

  • Resources are not wasted:
    • Even though your allocation and/or job priority may be small your job will run if machine resources are sitting idle.
    • Backfill scheduling - allows waiting jobs to use the reserved job slots of higher priority jobs, as long as they do not delay the start of the higher priority job.

  • Scheduling is dynamic with job priorities and usage information being recalculated frequently.

  • The details of the Fair Share with Half-Life Decay algorithm are more complex than presented here. See the following document for detailed information: https://slurm.schedmd.com/priority_multifactor.html.

Other Considerations:



Basic Functions

Building a Job Script

The Basics:

Options:

Usage Notes:



Basic Functions

Submitting Jobs

Job Submission Commands

Usage Notes:

Environment Variables:

Passing Arguments to Your Job:



Basic Functions

Monitoring Jobs

Multiple Choices:

squeue:

showq:

mdiag -j:

mjstat:

checkjob:

sprio -l & mdiag -p -v:

 Useful for determining where your jobs are queued relative to other jobs. Highest priority jobs are at the top of the list.

sview:
  • Graphically displays all user jobs on a cluster, nodes used, and detailed job information for each job.

  • Man page HERE

  • Examples:

sinfo:

  • Displays state information about a cluster's queues and nodes

  • Numerous options for additional/customized output

  • Common/useful options:
    • -s summarizes queue information

  • Man page HERE

  • Examples below:



Basic Functions

Job States and Status Codes



Moab Exercise 1

Getting Started

Overview:
  • Login to an LC cluster using your workshop username and OTP token
  • Copy the exercise files to your home directory
  • Familiarize yourself with the cluster's batch configuration
  • Familiarize yourself with the cluster's bank allocations
  • Create a job batch script
  • Submit and monitor your batch job
  • Check your job's output

GO TO THE EXERCISE HERE

    Approx. 20 minutes



Basic Functions

Holding and Releasing Jobs

Holding Jobs:

Releasing Jobs:



Basic Functions

Canceling Jobs



Basic Functions

Changing Job Parameters



Basic Functions

Setting Up Dependent Jobs



Basic Functions

Banks and Usage Information

Overview:

mshare:

mdiag -u:

sreport:



Basic Functions

Output Files

Defaults:

Assigning Unique Output File Names:

Caveats:



Basic Functions

Guesstimating When Your Job Will Start



Basic Functions

Determining When Your Job's Time is About to Expire

Signaling Method:

Polling Method:

More on yogrt_remaining:



Basic Functions

Running in Standby Mode



Displaying Configuration and Accounting Information


What's Available?



Parallel Jobs and the srun Command


srun Command:

srun options:

Parallel Jobs on BG/Q Systems:

Parallel Output:



Running Multiple Jobs From a Single Job Script


Motivation:

Sequential:

Simultaneous:



Running on Serial Clusters


Different than Other Clusters: multi vs. single-node

How to Specify the Right Number of Cores:



Batch Commands Summary



Moab Exercise 2

More Moab Functions

Overview:
  • Login to an LC workshop cluster, if you are not already logged in
  • Holding and releasing jobs
  • Canceling jobs
  • Running in standyby mode
  • Running parallel and hybrid parallel jobs
  • When will a job start?
  • Try sview

GO TO THE EXERCISE HERE






This completes the tutorial.

      Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise.

Where would you like to go now?



References and More Information