Using LC's Sierra Systems

Blaise Barney, Lawrence Livermore National Laboratory LLNL-WEB-750771

Table of Contents

  1. Abstract
  2. Sierra Overview
    1. CORAL
    2. CORAL Early Access Systems
    3. Sierra Systems
  3. Hardware
    1. Sierra Systems General Configuration
    2. IBM POWER8 Architecture
    3. IBM POWER9 Architecture
    4. NVIDIA Tesla P100 (Pascal) Architecture
    5. NVIDIA Tesla V100 (Volta) Architecture
    6. NVLink
    7. Mellanox EDR InfiniBand Network
    8. NVMe PCIe SSD (Burst Buffer)
  4. Accounts, Allocations and Banks
  5. Accessing LC's Sierra Machines
  6. Software and Development Environment - Summary and links to further information about:
    • Login Nodes and Launch Nodes
    • Login Shells and Files
    • Operating System
    • Batch System
    • File Systems
    • HPSS Storage
    • Modules
    • Compilers
    • Math Libraries
    • Debuggers, Performance Analysis Tools
    • Visualization Software
  7. Compilers
    1. Wrapper Scripts
    2. Versions
    3. Selecting Your Compiler Version
    4. IBM XL Compilers
    5. Clang Compiler
    6. GNU Compilers
    7. PGI Compilers
    8. NVIDIA NVCC Compiler
  8. MPI
  9. OpenMP
  10. System Configuration and Status Information
  11. Running Jobs
    1. Overview
    2. Batch Scripts and #BSUB / bsub
    3. Interactive Jobs
    4. jsrun Command and Resource Sets
    5. Job Dependencies
  12. Monitoring Jobs
  13. Interacting With Jobs
    1. Suspending / Resuming Jobs
    2. Modifying Jobs
    3. Signaling / Killing Jobs
  14. LSF - Additional Information
    1. LSF Documentation
    2. LSF Configuration Commands
  15. Math Libraries
  16. Parallel I/O
  17. Debugging
  18. Performance Analysis Tools
  19. References & Documentation


Abstract


This tutorial is intended for users of Livermore Computing's Sierra systems. It begins by providing a brief background on CORAL, leading to the CORAL EA and Sierra systems at LLNL. The CORAL EA and Sierra hybrid hardware architectures are discussed, including details on IBM POWER8 and POWER9 nodes, NVIDIA Pascal and Volta GPUs, Mellanox network hardware, NVLink and NVMe SSD hardware.

Information about user accounts and accessing these systems follows. User environment topics common to all LC systems are reviewed. These are followed by more in-depth usage information on compilers, MPI and OpenMP. The topic of running jobs is covered in detail in several sections, including obtaining system status and configuration information, creating and submitting LSF batch scripts, interactive jobs, monitoring jobs and interacting with jobs using LSF commands.

A summary of available math libraries is presented, as is a summary on parallel I/O. The tutorial concludes with discussions on available debuggers and performance analysis tools.

Level/Prerequisites: Intended for those who are new to developing parallel programs in the Sierra environment. A basic understanding of parallel programming in C or Fortran is required. Familiarity with MPI and OpenMP is desirable. The material covered by EC3501 - Introduction to Livermore Computing Resources would also be useful.



Sierra Overview


CORAL:

CORAL Early Access (EA) Systems:
  • In preparation for the final delivery Sierra systems, LLNL has implemented three "early access" systems, one on each network:
    • ray 
    - OCF-CZ
    • rzmanta 
    - OCF-RZ
    • shark 
    - SCF

  • Primary purpose is to provide platforms where Tri-lab users can begin porting and preparing for the hardware and software that will be delivered with the final Sierra systems.

  • Similar to the final delivery Sierra systems but use the previous generation IBM Power processors and NVIDIA GPUs.

  • IBM Power Systems S822LC Server:
    • Hybrid architecture using IBM POWER8+ processors and NVIDIA Pascal GPUs.

  • IBM POWER8+ processors:
    • 2 per node (dual-socket)
    • 10 cores/socket; 20 cores per node
    • 8 SMT threads per core; 160 SMT threads per node
    • Clock: due to adaptive power management options, the clock speed can vary depending upon the system load. At LC speeds can vary from approximately 2 GHz - 4 GHz.

  • NVIDIA GPUs:
    • 4 NVIDIA Tesla P100 (Pascal) GPUs per compute node (not on login/service nodes)
    • 3584 CUDA cores per GPU; 14,336 per node

  • Memory:
    • 256 GB DDR4 per node
    • 16 GB HBM2 (High Bandwidth Memory 2) per GPU; 732 GB/s peak bandwidth

  • NVLINK 1.0:
    • Interconnect for GPU-GPU and CPU-GPU shared memory
    • 4 links per GPU with 160 GB/s total bandwidth

  • NVRAM:
    • 1.6 TB NVMe PCIe SSD per compute node (CZ ray system only)

  • Network:
    • Mellanox 100 Gb/s Enhanced Data Rate (EDR) InfiniBand
    • One dual-port 100 Gb/s EDR Mellanox adapter per node

  • Parallel File System: IBM Spectrum Scale (GPFS)
    • ray: 
    1.3 PB
    • rzmanta: 
    431 TB
    • shark: 
    431 TB

  • Batch System: IBM Spectrum LSF


CORAL EA Ray Cluster
Click for larger image

Sierra Systems:
  • Sierra is a classified, 125 petaflop, IBM Power Systems AC922 hybrid architecture system comprised of IBM POWER9 nodes with NVIDIA Volta GPUs. Sierra is a Tri-lab resource sited at Lawrence Livermore National Laboratory.

  • Unclassified Sierra systems are similar, but smaller, and include:
    • lassen - a 20 petaflop system located on LC's CZ zone.
    • rzansel - a 1.5 petaflop system is located on LC's RZ zone.

  • IBM Power Systems AC922 Server:
    • Hybrid architecture using IBM POWER9 processors and NVIDIA Volta GPUs.

  • IBM POWER9 processors (compute nodes):
    • 2 per node (dual-socket)
    • 22 cores/socket; 44 cores per node
    • 4 SMT threads per core; 176 SMT threads per node
    • Clock: due to adaptive power management options, the clock speed can vary depending upon the system load. At LC speeds can vary from approximately 2.0 - 3.1 GHz.

  • NVIDIA GPUs:
    • 4 NVIDIA Tesla V100 (Volta) GPUs per compute, login, launch node
    • 5120 CUDA cores per GPU; 20,480 per node

  • Memory:
    • 256 GB DDR4 per compute node
    • 16 GB HBM2 (High Bandwidth Memory 2) per GPU; 900 GB/s peak bandwidth

  • NVLINK 2.0:
    • Interconnect for GPU-GPU and CPU-GPU shared memory
    • 6 links per GPU with 300 GB/s total bandwidth

  • NVRAM:
    • 1.6 TB NVMe PCIe SSD per compute node

  • Network:
    • Mellanox 100 Gb/s Enhanced Data Rate (EDR) InfiniBand
    • One dual-port 100 Gb/s EDR Mellanox adapter per node

  • Parallel File System: IBM Spectrum Scale (GPFS)

  • Batch System: IBM Spectrum LSF

  • Water (warm) cooled compute nodes


Sierra
Click for larger image
  • System Details:

Photos:



Hardware

Sierra Systems General Configuration

System Components:

Frames / Racks:

Nodes:

Networks:

File Systems:

Archival HPSS Storage:



Hardware

IBM POWER8 Architecture

 Used by LLNL's Early Access systems only (ray, rzmanta, shark)

IBM POWER8 SL822LC Node Key Features:

POWER8 Processor Key Characteristics:

POWER8 Core Key Features:

References and More Information:



Hardware

IBM POWER9 Architecture

 Used by LLNL's Sierra systems only (sierra, lassen, rzansel)

IBM POWER9 AC922 Node Key Features:

POWER9 Processor Key Characteristics:

POWER9 Core Key Features:

References and More Information:



Hardware

NVIDIA Tesla P100 (Pascal) Architecture

 Used by LLNL's Early Access systems only (ray, rzmanta, shark)

Tesla P100 Key Features:

Pascal GP100 GPU Components:

References and More Information:



Hardware

NVIDIA Tesla V100 (Volta) Architecture

 Used by LLNL's Sierra systems only (sierra, lassen, rzansel)

Tesla P100 Key Features:

Volta GV100 GPU Components:

References and More Information:



Hardware

NVLink

Overview:

References and More Information:



Hardware

Mellanox EDR InfiniBand Network

Hardware:

Topology and LC Sierra Configuration:

References and More Information:



Hardware

NVMe PCIe SSD (Burst Buffer)

Overview:

References and More Information:



Accounts, Allocations and Banks

Accounts:

Allocations and Banks:



Accessing LC's Sierra Machines

Overview:

How To Connect:



Software and Development Environment


Similarities and Differences:

Login Nodes:

Launch Nodes:

Login Shells and Files:

Operating System:

Batch System:

File Systems:

HPSS Storage:

Modules:

Compilers:

Math Libraries

Debuggers and Performance Analysis Tools:

Visualization Software and Compute Resources:



Compilers


Available Compilers:

Compiler Recommendations:

Wrappers Scripts:

Versions:

Selecting Your Compiler and MPI Version:

IBM XL Compilers:

IBM Clang Compiler:

GNU Compilers:

PGI Compilers:

NVIDIA NVCC Compiler:



MPI


IBM Spectrum MPI:

Versions:

MPI and Compiler Dependency:

MPI Compiler Commands:

Running MPI Jobs:

Documentation:



OpenMP


OpenMP Support:

Compiling:

More Information:



System Configuration and Status Information


First Things First:


LC Homepage: hpc.llnl.gov

MyLC User Portal: mylc.llnl.gov

System Configuration Information:

System Configuration Commands:

System Status Information:

  • LC Homepage:
    • hpc.llnl.gov (User Portal toggle) - just look on the main page for the System Status links (shown at right).
    • The same links appear under the Hardware menu.
    • Unclassified systems only

  • MyLC Portal:
    • mylc.llnl.gov
    • Several portlets provide system status information:
      • machine status
      • login node status
      • scratch file system status
      • enclave status
    • Classified MyLC is at: https://lc.llnl.gov/lorenz/

  • Machine status email lists:
    • Provide the most timely status information for system maintenance, problems, and system changes/updates
    • ocf-status and scf-status cover all machines on the OCF / SCF
    • Additionally, each machine has its own status list - for example:
      sierra-status@llnl.gov

  • Login banner & news items - always displayed immediately after logging in
    • Login banner includes basic configuration information, announcements and news items. Example login banner HERE.
    • News items (unread) appear at the bottom of the login banner. For usage, type news -h.



Running Jobs

Overview

Very Different From Other LC Systems:

Accounts and Allocations:

Queues:

Batch Jobs - General Workflow:

  1. Login to a login node.

  2. Create / prepare executables and associated files.

  3. Create an LSF job script.

  4. Submit the job script to LSF with the bsub command. For example:
    bsub < myjobscript
  5. LSF will migrate the job to a launch node and acquire the requested allocation of compute nodes from the requested queue. If not specified, the default queue (usually pbatch) will be used.

  6. The jsrun command is used within the job script to launch the job on compute nodes. If jsrun is not used, then the job will run on the launch node only.

  7. Monitor and interact with the job from a login node using the relevant LSF commands.

Interactive Jobs - General Workflow:

  1. Login to a login node.

  2. Create / prepare executables and associated files.

  3. From the login node command line, request an interactive allocation of compute nodes from LSF with the bsub command. For example:
    bsub -nnodes 16 -Ip -G guests -q pdebug /usr/bin/tcsh
    Requests 16 nodes, Interactive pseudo-terminal, guests account, pdebug queue, running the tcsh shell.

  4. LSF will migrate the job to a launch node and acquire the requested allocation of compute nodes from the requested queue. If not specified, the default queue (usually pbatch) will be used.

  5. When ready, an interactive terminal session will begin on the launch node.

  6. From here, shell commands, scripts or parallel jobs can be executed from the launch node:
    • Parallel jobs are launched with the jsrun command from the shell command line or from within a user script. Will execute on the allocated compute nodes.
    • Non-jsrun jobs will run on the launch node only.

  7. LSF commands can be used to monitor and interact with the job, either from a login node or the launch node.



Running Jobs

Batch Scripts and #BSUB / bsub

LSF Batch Scripts:

#BSUB / bsub:

What Happens After You Submit Your Job?:

Environment Variables:



Running Jobs

Interactive Jobs



Running Jobs

jsrun Command and Resource Sets

jsrun Overview:

Resource Sets:

jsrun Options:



Running Jobs

Job Dependencies

#BSUB -w Option:

bjdepinfo Command:



Monitoring Jobs

bjobs:

lsfjobs:

bpeek:

bhist:

Job States:



Interacting With Jobs

Suspending / Resuming Jobs

bstop and bresume Commands:



Interacting With Jobs

Modifying Jobs

bmod Command:



Interacting With Jobs

Signaling / Killing Jobs

bkill Command:

LSF - Additional Information


LSF Documentation:

LSF Configuration Commands:

     bparams Command:

     bqueues Command:

     bhosts Command:

     lshosts Command:



Math Libraries


ESSL:

IBM's Mathematical Acceleration Subsystem (MASS) Libraries:

LAPACK, ScaLAPACK, BLAS, BLACS:

FFTW:

PETSc:

GSL - GNU Scientific Library:

NVIDIA CUDA Tools:



Parallel I/O

This section to be added later












Debugging

This section to be added later












Performance Analysis Tools

This section to be added later












References & Documentation

Livermore Computing General Documentation:

CORAL Early Access systems, POWER8, NVIDIA Pascal:

Sierra systems, POWER9, NVIDIA Volta:

LSF Documentation:

Compilers and MPI Documentation:







This completes the tutorial.

Evaluation Form