Moab and SLURM

Author: Blaise Barney, Lawrence Livermore National Laboratory UCRL-PRES-228819

Table of Contents

  1. Abstract
  2. What are Moab and SLURM?
  3. What a "Job" Means to Moab
  4. Moab Grid Configurations
  5. Queues and Queue Limits
  6. Moab Banks
  7. Fair Share Job Scheduling
  8. Basic Moab Functions
    1. Building a Job Script
    2. Submitting Jobs and MSUB Options
    3. Monitoring Jobs
    4. Exercise 1
    5. Job States and Exit Status Codes
    6. Holding/Releasing Jobs
    7. Canceling Jobs
    8. Changing Job Parameters
    9. Setting Up Dependent Jobs
    10. Banks and Usage Information
    11. Guesstimating Jobs
    12. Output Files
    13. Determining When Your Job's Time is About to Expire
  9. Other Moab Functions
    1. Displaying Configuration and Accounting Information
    2. Showing System State
    3. Running in Standby Mode
    4. Setting User Job Priority
    5. Expediting Jobs
  10. Parallel Jobs and the srun Command
  11. Running on Serial Clusters
  12. Batch Commands Summary
  13. Exercise 2
  14. References and More Information
  15. Appendix A: Moab Support for Legacy LCRM



Abstract


Moab is a Workload Manager product of Adaptive Computing, Inc. SLURM is the native scheduler software that runs on all LC clusters. Both of these schedulers are used to manage jobs running on LC systems. This tutorial presents the essentials for using Moab and SLURM on LC platforms. It begins with an overview of Moab and discussions on how Moab is configured, including Moab grids, queues and queue limits, banks and fair-share job scheduling. Basic Moab functions are covered next, including how to build batch scripts, submit, monitor, change, hold/release, and cancel jobs. Dependent jobs, bank usage information, guestimating jobs, output files, and determining when a job will expire round out the basic Moab functions. The tutorial concludes with a discussion on parallel jobs and the srun command, and a list of topics not covered. This tutorial includes both C and Fortran example codes and a lab exercise.

Level/Prerequisites: The material covered in EC3501: Livermore Computing Resources and Environment would be helpful.



What Are Moab and SLURM?


What is Moab?

Meta vs. Native Scheduler:

What is SLURM?


Relationship between the Moab meta-scheduler and native schedulers

Tri-lab Implementation:



What a "Job" Means to Moab


Simple Definition:

Moab Definition: (slightly more complex)



Moab Grid Configurations


What is a Moab Grid?

Moab Grid Configurations at LC:



Queues and Queue Limits


Queues (also called Pools and/or Partitions):

How Do I Find Out What the Queue Limits Are?



Moab Banks


Bank Hierarchy:

Bank Shares:



Fair Share Job Scheduling


Why in the World Won't My Job Run?
  • Undoubtedly, this is the most commonly asked batch system question.

  • Classic scenario: a user submits a job requesting 16 nodes when 50 nodes are shown as available/idle. However, the job sits in the queue and doesn't run. Why?

  • Aside from any "user error" related reasons, there are several other, sometimes complicated, reasons.

  • Probably the most important reason is the underlying mechanism used by the batch system to determine when/if a job should run.

  • At LC, the Moab scheduler has been programmed to use a "Fair Share with Half-Life Decay of Usage" algorithm for determining a job's eligibility to run.
Fair Share with Half-Life Decay of Usage:
  • This is the primary mechanism used to determine job scheduling. It is based upon a dynamically calculated priority for your job that reflects your share allocation within a bank versus your actual usage.
    • Use more than your share, your priority/service degrades
    • Use less than your share, your priority/service improves
    • Your priority can become very low, but you never "run out of time" at LC.

  • Jobs with higher priorities often need to acquire their full set of nodes over time. While their nodes are being reserved, the nodes will appear to be idle.

  • Half-Life Decay: Without new usage, your current usage value decays to half its value in two weeks.

  • Resources are not wasted:
    • Even though your allocation and/or job priority may be small your job will run if machine resources are sitting idle.
    • Backfill scheduling - allows waiting jobs to use the reserved job slots of higher priority jobs, as long as they do not delay the start of the higher priority job.

  • Moab's scheduling is dynamic with job priorities and usage information being recalculated frequently.

  • The details of the Fair Share with Half-Life Decay algorithm are a bit more complex than presented here. See the following documents for detailed information:

Other Considerations:



Basic Moab Functions

Building a Job Script

The Basics:

Usage Notes:



Basic Moab Functions

Submitting Jobs and MSUB Options

msub

Usage Notes:

Discussion on the -l Option:

Environment Variables:

Passing Arguments to Your Job:



Basic Moab Functions

Monitoring Jobs

Several Choices:

Moab Non-Moab
  • showq
  • checkjob
  • mdiag -j
  • showstart
  • squeue
  • mjstat
  • sview
  • smap
  • ju

showq:

checkjob:

mdiag -j:

squeue:

sview:

smap:

ju:

mjstat

So Many Choices...



Moab Exercise 1

Getting Started

Overview:
  • Login to an LC cluster using your workshop username and OTP token
  • Copy the exercise files to your home directory
  • Familiarize yourself with the cluster's batch configuration
  • Familiarize yourself with the cluster's bank allocations
  • Create a Moab batch script
  • Submit and monitor your batch job
  • Check your job's output

GO TO THE EXERCISE HERE

    Approx. 20 minutes



Basic Moab Functions

Job States and Exit Status Codes



Basic Moab Functions

Holding and Releasing Jobs

Holding Jobs:

Releasing Jobs:



Basic Moab Functions

Canceling Jobs



Basic Moab Functions

Changing Job Parameters



Basic Moab Functions

Setting Up Dependent Jobs



Basic Moab Functions

Banks and Usage Information

Overview:

mshare:

mdiag -u:

sreport:



Basic Moab Functions

Guesstimating Jobs

showstart:

showbf:



Basic Moab Functions

Output Files

Defaults:

Assigning Unique Output File Names:

Caveats:



Other Moab Functions

Determining When Your Job's Time is About to Expire

Signaling Method:

Polling Method:

More on yogrt_remaining:

More Information:



Other Moab Functions

Displaying Configuration and Accounting Information

What's Available?

mdiag:

showstats:



Other Moab Functions

Showing System State



Basic Moab Functions

Running in Standby Mode



Basic Moab Functions

Setting User Job Priority



Other Moab Functions

Expediting Jobs



Parallel Jobs and the srun Command


srun Command:

srun options:

Parallel Jobs on BG/Q Systems:

Running Multiple Jobs Simultaneously:

Parallel Output:



Running on Serial Clusters


Different than Other Clusters: multi vs. single-node

Use of the Moab -l ttc= (total task count) Option:



Batch Commands Summary



Moab Exercise 2

More Moab Functions

Overview:
  • Login to an LC workshop cluster, if you are not already logged in
  • Holding and releasing jobs
  • Canceling jobs
  • Running in standyby mode
  • Running parallel and hybrid parallel jobs
  • When will a job start?
  • Try sview

GO TO THE EXERCISE HERE






This completes the tutorial.

      Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise.

Where would you like to go now?



References and More Information




Appendix A: Moab Support for Legacy LCRM

Supported LCRM commands:

Unsupported Commands:

Job States:

Converting LCRM Scripts to Moab: