Livermore Computing Resource Management System (LCRM)

Blaise Barney, Lawrence Livermore National Laboratory

Note: As of 1/09, this tutorial is no longer being maintained because LCRM is no longer used on LC Machines. It has been replaced by the Moab workload manager. Please see the Moab tutorial for details.

Table of Contents

  1. Abstract
  2. LCRM Overview
  3. Resource Allocation & Control System (RAC)
  4. LCRM Bank Structure
  5. Bank Shares
  6. User RAC Utilities
    1. defbank
    2. newbank
    3. pshare
    4. bac
    5. brlim
    6. pquota
    7. lrmusage
    8. LCRM Usage GUI
  7. Production Workload Scheduler (PWS)
  8. LCRM Job Scheduling
  9. Batch Job Limits
  10. Building a Job Control Script
  11. Optimizing CPU Usage
  12. Batch Utilities and Commands
    1. psub: Submitting a Job
    2. pstat, spjstat, ju: Displaying Job Status
    3. prm: Cancelling a Job
    4. phold, prel: Holding and Releasing Jobs
    5. palter: Changing a Job's Attributes
    6. pexp: Expediting a Job
    7. phist: Job Memory Statistics and History
    8. phstat: Showing a Host's Attributes
    9. plim: Showing a Machine's Job Limits
    10. lrmmgr: Obtaining Configuration Information
  13. Batch Debugging, I/O and Miscellaneous Considerations
  14. References and More Information
  15. Exercise



Abstract


The Livermore Computing Resource Management System (LCRM) is a product of LLNL Livermore Computing Center (LC). Its primary purpose is to allocate computer resources, according to resource delivery goals, for LC's production computer systems. It is the batch system that LC users use to submit, monitor, and interact with their production computing jobs.

This tutorial begins with a brief overview of LCRM and its two primary functional components, the Resource Allocation and Control System and the Production Workload Scheduler. Each of these components is then further explored, with a practical focus on describing commands and utilities that are provided for the user's interaction with LCRM. Building job command scripts, running parallel jobs, and job scheduling policies are also included. The lecture is followed by a lab exercise.

Note: LC is currently migrating all of its production machines from LCRM to the Moab Workload Manager.

Level/Prerequisites: Beginner. The material covered by the following tutorials would also be useful:
EC3501: Introduction to Livermore Computing Resources
EC3503: IBM POWER Systems Overview
EC3516: Linux Clusters Overview



LCRM Overview

Resource Delivery Goals:

Architecture:



Resource Allocation & Control System (RAC)



LCRM Bank Structure

LCRM bank structure example


Bank Shares



User RAC Utilities

The following commands enable you to query/set Resource Allocation & Control System (RAC) parameters. Only a brief description of each is provided. Additional detailed information (man page) can be obtained by clicking on the hyperlinked command names.

defbank


newbank


pshare


bac


brlim


pquota


lrmusage / pcsusage


LCRM Usage GUI



Production Workload Scheduler (PWS)



LCRM Job Scheduling


Fair Share with Half-Life Decay of Usage:

Other Considerations:



Batch Job Limits


In General:

Pools:

How Do I Find Out What the Limits Are?

  1. The most up to date information can be found by logging into the machine you want to use and issuing the command news job.lim.[system]. For example:
    news job.lim.thunder    news job.lim.purple
    news job.lim.alc        news job.lim.um

    If you're not sure of the actual command to use, try news job.limits - it usually provides helpful hints.

  2. OCF Job limits can also be found by consulting the "Job Limits" links OCF Machine News and Information (LLNL internal) web page.

  3. Use the tables below, but note the caveat that they are not necessarily as up to date as the previous two methods. The tables below reflect job limits as of 2/08.

Batch Limits for IBM AIX Systems
System Batch Pool Shift Max Time Max Nodes Max Jobs
PURPLE
(SCF)
pbatch All shifts 24 hr 700 5
viz All shifts n/a n/a n/a
pdebug Not currently configured
TEMPEST
(SCF)
4-way All shifts 12 hr 7 4
16-way All shifts 12 hr 3 4
UM
(OCF)
pbatch All shifts 24 hr 32 5
pdebug All shifts 2 hr 2 2
UP
(OCF)
pbatch All shifts 12 hr 32 4
pdebug All shifts 2 hr 2 n/a
UV
(OCF)
pbatch All shifts 24 hr 32 5
pdebug All shifts 2 hr 2 2
Batch Limits for Linux Systems
System Batch Pool Shift Max Time Max Nodes Max Jobs
ALC
(OCF)
pbatch Week 8 hr n/a n/a
Weekend 24 hr n/a n/a
pdebug Weekday 30 min 8 n/a
Off hours 2 hr 8 n/a
ATLAS
(OCF)
pbatch Week 16 hr 1072 n/a
Weekend 24 hr 1072 n/a
pdebug Weekday 30 min 16 n/a
Off hours 2 hr 32 n/a
BGL
(SCF)
pbatch All shifts 24 hr multiple of 512
up to 65536
n/a
LILAC
(SCF)
pbatch Week 12 hr 256 11
Weekend 24 hr 256 11
pdebug All shifts 30 min n/a n/a
MINOS
(SCF)
pbatch All shifts 12 hr 128 3
pdebug All shifts 1 hr 16 n/a
RHEA
(SCF)
pbatch All shifts 12 hr 128 3
pdebug All shifts 1 hr 16 n/a
THUNDER
(OCF)
pbatch Week 12 hr 493 n/a
Weekend 24 hr 493 n/a
pdebug Weekday 30 min 16 n/a
Off hours 2 hr 16 n/a
ZEUS
(OCF)
pbatch Week 12 hr 64 100
Weekend 24 hr 64 100
pdebug All shifts 30 min 12 n/a
Batch Limits for Serial/Single-node Linux Systems
System Nodes Interactive
Batch
Max Memory Max Time Max Jobs per Node
User Total
ACE
(SCF)
ace1-ace8 Interactive 400 MB 30 min n/a n/a
ace9-ace152 Interactive
Batch
400 MB
4 GB
30 min
200 hr
2 2
ace153-ace160 Interactive
Batch
400 MB
4 GB
30 min
72 hr
2 2
ace161-ace176 ICF Use Only
HOPI
(SCF)
hopi1-hopi4 Interactive n/a 30 min n/a n/a
hopi5-hopi8 Interactive
Batch
n/a
n/a
30 min
72 hr
8 8
hopi9-hopi80 Interactive
Batch
n/a
n/a
30 min
200 hr
8 8
QUEEN
(SCF)
queen1-queen4 Interactive 400 MB 30 min n/a n/a
queen5-queen63 Interactive
Batch
400 MB
4 GB
30 min
200 hr
2 2
YANA
(OCF)
yana1-yana4 Interactive n/a 30 min n/a n/a
yana5-yana8 Interactive
Batch
n/a
16 GB
30 min
12 hr
1 8
yana9-yana24 Interactive
Batch
n/a
16 GB
30 min
50 hr
1 8
yana25-yana59 Interactive
Batch
n/a
3 GB
30 min
200 hr
1 8
yana60-yana79 Interactive
Batch
n/a
32 GB
30 min
200 hr
1 8


Building a Job Control Script

LCRM Job Control Options:

-tM versus -tW ?

Other Notes:



Optimizing CPU Usage


ASC IBMs:

Note that for threaded processes, having "unused" CPUs is actually the right thing to do, since the threads will need to execute on them.

Linux clusters with a switch:

Linux clusters without a switch:



Batch Utilities and Commands


LCRM provides the following utilites/commands for managing your batch job. A brief description of each is provided. Additional detailed information can be reviewed in each command's man page by clicking on the hyperlinked command name.

psub


pstat


spjstat & spj


ju


prm


phold & prel


palter


pexp


phist


phstat


plim


lrmmgr



Batch Debugging, I/O and Miscellaneous Considerations


Batch Debugging

I/O Issues

Miscellaneous




This completes the tutorial.

Evaluation Form       Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise.

Where would you like to go now?



References and More Information