Slurm and Moab Exercise

Exercise 1

Preparation:
  1. Login to the workshop machine

    Workshops differ in how this is done. The instructor will go over this beforehand.

  2. Copy the exercise files into your home directory then cd into it:

    cp  -R /usr/global/docs/training/blaise/slurmmoab   ~  
    cd  slurmmoab
    

  3. List the contents of your subdirectory. You should notice the following files:

    Description Files
    Simple shell script used for first exercise exercise1
    Parallel program source code - C version mpi_array.c
    Parallel program source code - Fortran version mpi_array.f
    Hybrid parallel (MPI + threads) program source code - C version mpithreads.c
    Multiple jobs from single batch script example - C version multijob.c
    Solutions to exercises Slurm
    slurm1
    slurm1.out
    slurm2
    slurm3
    mpiarray.out
    slurm4
    mpithreads.out
    slurm5
    Moab
    moab1
    moab1.out
    moab2
    moab3
    mpiarray.out
    moab4
    mpithreads.out
    moab5

Review your cluster's batch configuration

  1. Try the commands below:
    news job.lim.machine - where machine is the name of the cluster
    sinfo -s
    mjstat | head
    
  2. Questions:
    • Which queues are configured?
    • How many nodes are there in each queue?
    • What are the batch queue node and time limits?
    • What states are the nodes in (alloc, idle, etc.)?

Find out which banks are available

  1. To see which banks (accounts) are available to you on this cluster, simply issue the mshare command. Note that it displays your bank allocation and usage information also.

  2. The mdiag -u classXX command can be used to list your banks (accounts) and also show your valid QOS options.

  3. To view the entire bank hierarchy, use the mshare -t root command.

Create and run a job script

  1. Using your favorite text editor (vi/vim, emacs, nedit, gedit, nano...), create a job script that does the following:
    • Runs under the classXX login shell (/bin/tcsh)
    • Sets a time limit of 5 minutes
    • Requests 1 node
    • Runs in the pReserved queue
    • Writes the batch output to a file name of your choosing
    • Gives your job a unique name
    • Changes to your slurmmoab subdirectory
    • Issues the name of the host it is running on
    • Issues the jobid for this job
    • Shows your path
    • Runs the exercise1 executable provided to you
    • Sleeps for a few minutes (so you can have time to check on it)

    For reference, you can review the appropriate solution file: or

  2. Submit your job using either the sbatch (Slurm) command or the msub (Moab) command.
    • Was your job script accepted?
    • What was its jobid number?
    • Problems? Check your script against the or solution file for errors.

Monitor your job

  1. The tutorial described several ways to monitor your job, including:
    • squeue
    • mjstat
    • showq
    • checkjob
    • mdiag -j

  2. Try any/all of these commands, noting their similarities and differences.
    Hint: you may want to pipe the output of the more verbose commands into grep with the jobid or your workshop username. For example:
    showq | grep class04
    If you run out of time, you can submit another job with a longer "sleep".

  3. If you have questions about the output of these commands, check the tutorial and/or man pages.

  4. After your job completes, examine its status using the checkjob and showq -c | grep jobid commands.

Check your job's output

  1. Review the output file from your job.
    • Where did you find it?
    • Is it named what you specified?
    • Is the output what should be expected? Compare your output file to the or output file.
      You may also want to look at the executable.

This completes Exercise 1




Exercise 2

  1. Still logged into the workshop cluster?

    If so, then continue to the next step. If not, then login as you did previously for Exercise 1.

Holding and releasing a job

  1. Using your same job script, submit the job so that it is held. This can be done on the msub or sbatch command line, or from within the script itself. Try both ways. If you have any questions see the tutorial.

  2. Verify that your job is actually in a holding state

  3. Release your job(s) so that they run to completion.

  4. Verify that the job release actually took effect

Canceling a job

  1. Once again, submit your job script.

  2. Try to cancel it before it completes. You can do this when its queued or when it's running. If you have any questions see the tutorial.

  3. Confirm that the job is actually cancelled. Also, check its post-execution status with the checkjob command.

Running in standby mode

  1. Modify your job script so that it will run in standby mode. The and files are provided for reference.

  2. Submit your job script.

  3. When your job starts to run, verify that it is running in standby mode. One way to do this is use the checkjob command and look for qos:standby near the top of the output.

  4. Submit your job script again but be sure to have the job HELD.

  5. Confirm that the job is held.

  6. Now change the qos from standby to normal for this job. If you have any questions see the tutorial.

  7. Confirm that the qos was changed.

  8. Cancel the job (or release it and let it run) when you're sure that it was changed.

Run a parallel job

  1. Using the slurm1 or moab1 example file, copy it to a new file - call it whatever you'd like.

  2. Modify your new file so that:
    • Four nodes are requested
    • A new output file name is used
    • It compiles either mpi_array.c (use "mpicc") or mpi_array.f (use "mpif77").
    • Lists the names of the nodes used to run the job
    • Runs a 48-task MPI job using the mpi_array executable you created in the previous step.

    The and example files are provided for reference.

  3. Submit your job and monitor it, making sure it is using the number of nodes/tasks specified.

  4. Check your output file to verify that things worked. See as a comparison.

Run a hybrid (MPI + threads) parallel job

  1. The example file mpithreads.c combines MPI with pthreads. The basic idea is to run one MPI task per node, and then spawn one thread for each core on that node. The threads do the actual work and MPI is used to collect the results across all nodes. Feel free to examine the source code if you'd like.

  2. Using the slurm3 or moab3 example file, copy it to a new file - call it whatever you'd like.

  3. Modify your new file so that:
    • A new output file name is used
    • Compiles the mpithreads.c file (use "mpicc -pthread")
    • Runs a 4-task MPI job using the executable you created in the previous step. However, this time run with only one task per node. This will permit the threads spawned by each MPI task to use the available cores on a node without competition from the threads of other MPI tasks.

    The and example files are provided for reference.

  4. Submit your job and monitor it, making sure it is using the number of nodes/tasks specified.

  5. Check your output file to verify that things worked. See as a comparison.

Run multiple jobs from a single batch script

  1. The and example files demonstrate how to run multiple jobs from a single batch script.

  2. Review either example file and note what is being done:
    • Four nodes are requested
    • A simple executable is compiled
    • Four 1-node jobs are launched to run simultaneously

  3. Submit either example file and then review its output when it completes.

When will my job start?

    Some of the most frequently asked questions by users include;

    There are several common answers to these questions. Assuming that there are no system problems or errors in the user's job submission script, one of the most common reasons has to do with a job's calculated priority and the scheduler's fair-share algorithms.

  1. Use one of the commands below to generate a list of eligible jobs and their priorities:
    sprio -l  |  more
    mdiag -p -v  |  more
    As you scroll through the list, note that it is sorted by jobid.

  2. To make the list more meaningful, sort it by priority (highest to lowest):
    sprio -l  |  sort -r -k 3,3
    mdiag -p -v  |  sort -r -k 3,3
    You can now find where any job is relative to other jobs in the queue.
    Columns 4-9 show the factors used to compute priority values.

  3. You also use the checkjob jobid command to view the scheduler's current estimate on when your job will start. Look for the line that shows "StartTime:" (if it exists). For example:
    % checkjob 87889
    ...
    WallTime:  00:00:00 of 1-00:00:00
    SubmitTime: Wed Jun  7 10:40:27
      (Time Queued Total: 00:01:48   Eligible: 00:01:48)
    
    StartTime: Thu Jun  8 10:41:57
    Total Requested Tasks:  1
    Total Requested Nodes:  1
    Partition: pbatch
    Dedicated Resources Per Task: lscratchf
    Node Access: SINGLEJOB
    ...
    

  4. Sometimes the squeue --start command can be used to get an estimate for job start times. And sometimes it can't...

    Note that start times can change dynamically if new jobs with a higher priority are submitted.

Try sview

    The sview utility provides a graphical view of all user jobs running on a cluster. Give it a try if you haven't already.

Documentation - if you still have time

    The best launching spot for LC Slurm / Moab documentation is: https://hpc.llnl.gov/banks-jobs/running-jobs





This completes the exercise.

Evaluation Form       Please complete the online evaluation form if you have not already done so for this tutorial.

Where would you like to go now?