Livermore Computing Resources and Environment

Author: Blaise Barney, Lawrence Livermore National Laboratory UCRL-MI-133316

Table of Contents

  1. Abstract
  2. Organization
  3. Terminology
  4. Hardware
    1. Systems Summary
    2. IBM BG/Q Systems
    3. Intel Xeon Systems
    4. AMD Opteron Systems
    5. Future Systems
    6. Typical LC Linux Cluster
    7. Infiniband Interconnect
    8. Facilities, Machine Room Tours, Photos
  5. Accounts
  6. Accessing LC Systems
    1. Passwords and Authentication
    2. Access Methods
    3. Where to Login
    4. A Few More Words About SSH
    5. Remote Access Services
    6. SecureNet
  7. File Systems
    1. Home Directories and Login Files
    2. /usr/workspace File Systems
    3. Temporary File Systems
    4. Parallel File Systems
    5. Archival HPSS Storage
    6. /usr/gapps, /usr/gdata File Systems
    7. Quotas
    8. Purge Policies
    9. Backups
    10. File Transfer and Sharing
    11. File Interchange System (FIS)
  8. System Status and Configuration Information
  9. Exercise 1
  10. Software and Development Environment Overview
    1. Development Environment Group (DEG)
    2. TOSS Operating System
    3. Software Lists
    4. Dotkit, Modules
    5. Confluence Wiki, STASH, JIRA
  11. Compilers
  12. Debuggers
  13. Performance Analysis Tools
  14. Graphics Software and Resources
  15. Other Software and Tools
  16. Running Jobs
    1. Where to Run?
    2. Batch Versus Interactive
    3. Starting Jobs - srun
    4. Interacting With Jobs
    5. Other Topics of Interest
  17. Batch Systems
  18. Miscellaneous Topics
    1. Big Data at LC
    2. Green Data Oasis
    3. Security Reminders
  19. Where to Get Information & Help
  20. Exercise 2




Abstract


This is the second tutorial in the "Livermore Computing Getting Started" workshop. It provides an overview of Livermore Computing's (LC) supercomputing resources and how to effectively use them. As such, it is definitely intended as a "getting started" document for new users or for those who want to know "in a nutshell" what supercomputing at LC is all about from a practical user's perspective. It is also intended to provide essential, practical information for attendees planning to attend the other tutorials in this workshop.

A wide variety of topics are covered in what is hopefully, a logical progression, starting with a description of the LC organization, a summary of the available supercomputing hardware resources, how to obtain an account and how to access LC systems. Important aspects concerning the user environment are then addressed, such as the user's home directory, various files and file systems, how to transfer/share files, quotas, archival storage and getting system status/configuration information. A brief description of the software development environment (compilers, debuggers, and performance tools), a summary of video and graphics services, and the basics of how to run jobs follow. Several miscellaneous topics are discussed. Finally, this tutorial concludes with a discussion on where to obtain more information and help. Note: This tutorial only provides an overview of using LC's Moab/SLURM batch systems - these topics are covered in the EC4045 "Moab and SLURM" tutorial.

Level/Prerequisites: This tutorial is geared to new users of LC systems and might actually be considered a prerequisite for using LC systems and attending other tutorials that describe parallel programming on LC systems in more detail.



Organization


What Is Livermore Computing?



Terminology


"The acronyms can be a bit overwhelming"
- Excerpt from a workshop attendee evaluation form

A more complete list of acronyms can be found HERE. A very concise subset relevant to this tutorial appears below.

DISCLAIMER: All information presented today is subject to change! This information was current as of June 2016.


Hardware

Systems Summary

Mix of Resources:

Primary Systems:

Peak Comparisons:



Hardware

IBM Blue Gene/Q Systems

 BG/Q users should consult the "Additional Information" references below. This tutorial does not cover much of the unique BG/Q architecture and environment.
Overview:
  • Comprise LC's largest systems: Sequoia and Vulcan

  • Unique BG/Q architecture - some key features:
    • 64-bit, 16 PowerPC A2 cores @1.6 GHz per node
    • 4 hardware threads per core
    • 5-Dimensional Torus network
    • Extremely power efficient
    • Water cooling
    • Transactional "rollback" memory in hardware

  • seq:
    • Sequoia is a 20 Pflop, classified BG/Q machine with 98,304 compute nodes and 1,572,864 cores.
    • Sequoia was ranked as the world's most powerful computer from June-November, 2012. The NNSA press release is available HERE.
    • Shared between Tri-lab users
    • Accounts provided through the Capability Computing Campaign (CCC) proposal process

  • vulcan:
    • Vulcan is a 5 Pflop BG/Q system in the unclassified Collaboration Zone (CZ)
    • Identical architecture to seq - just smaller
    • Mostly LLNL ASC/M&IC/HPCIC and PSAAP university users

  • rzuseq:
    • rzuseq is a 512 node system in the unclassified Restricted Zone (RZ)
    • Not a production machine
    • Identical architecture to seq and vulcan - just smaller

Additional Information:

 Sequoia BG/Q Tutorial: computing.llnl.gov/tutorials/bgq
 Highly recommended for all BG/Q users, due to BG/Q's unique architecture and environment.




Sequoia BG/Q System

Click for larger image


Vulcan BG/Q System

System Details:



Hardware

Intel Xeon Systems

Overview:
  • The majority of LC's systems are Intel Xeon based Linux clusters, and include: the following processor architectures:
    • Intel Xeon 8-core E5-2670 (Sandy Bridge - TLCC2) w/without NVIDIA GPUs
    • Intel Xeon 12-core E5-2695 v2 (Ivy Bridge)
    • Intel Xeon 6-core X5660 (Westmere) w/without NVIDIA GPUs
    • Intel Xeon 4-core E5530 (Nehalem)

  • Mix of resources:
    • 4, 6, 8 and 12 core processors
    • OCF and SCF
    • ASC, M&IC, VIZ
    • Capacity, Grand Challenge, visualization, testbed
    • Several GPU enabled clusters

  • 64-bit architecture

  • TOSS operating system

  • Infiniband interconnect

  • Hyper-threading enabled (2 threads/core)

  • Vector/SIMD operations

  • For detailed hardware information, please see the "Additional Information" references below.

Additional Information:




Hyperion Intel Cluster

Click for larger image


Zin Intel Cluster

System Details:



Hardware

AMD Opteron Systems

 LC's remaining AMD Opteron systems will be retired soon.

Overview:
  • Only two AMD Opteron systems remain in operation at LC:
    • 16 or 32 cores per node
    • Opteron processors 8356, 6128 or 8354 operating at 2.0-2.3 GHz
    • Infiniband interconnect

  • 64-bit architecture

  • TOSS operating system

  • For detailed hardware information, please see the "Additional Information" references below.

Additional Information:


Juno AMD Cluster

Click for larger image

System Details:



Hardware

Future Systems

Advanced Technology Systems (ATS):

  • Supercomputers dedicated to the largest and most complex calculations critical to stockpile stewardship; "capability computing".

  • Typically include leading-edge/novel architecture components, custom engineering

  • Shared across the Tri-labs; accounts granted to projects via a formal proposal process

  • TRINITY:
    • Sited at LANL
    • Phase 1: Intel Xeon E5-2698V3 (Haswell), 9436 nodes, 32 cores/node, 128 GB/node, ~11 PF
    • Phase 2: Intel Xeon Knights Landing w/Phi, over 9500 nodes, 72 cores/node, 96 + 12 GB/node, ~31 PF
    • Cray Aries network; burst buffers

  • SIERRA:
    • Will be sited at LLNL
    • IBM Power9 with NVIDIA Volta GPUs. 120-150 PF
    • NVLINK GPU interconnect; Mellanox IB network
    • Other architecture and configuration details TBA

Commodity Technology Systems (CTS):

  • Robust, cost-effective systems to meet the day-to-day simulation workload needs of the ASC program; "work-horse, capacity computing"

  • Common Tri-Lab procurement with platforms delivered to all three labs; accounts handled independently by each lab.

  • CTS-1:
    • Intel Broadwell E5-2695 v4 processor @ 2.1 GHz
    • Dual-socket; 18 cores/socket; 36 cores/node
    • 128 GB memory/node, 100 Gb/s IB network
    • Sizes of systems will vary, depending upon the number of scalable units (SU) used to build them
    • TOSS-3 software stack will be similar to past TLCC systems



Trinity

Click for larger image


Sierra



Hardware

Typical LC Linux Cluster

Basic Components:

Nodes:

Frames / Racks:

Scalable Unit:



Hardware

Infiniband Interconnect

Primary components:

Topology:

Performance:

Hardware

Facilities, Machine Room Tours, Photos

Facilities:
  • Most of LC's computing resources are located in the Livermore Computing Complex (LCC) building 453, and adjacent building 451. The LCC was formerly known as the Terascale Simulation Facility (TSF).

  • Map available HERE

  • LCC highlights:
    • Four-story office tower with 121,600 square feet for 285 offices, a visualization theater, a 150-seat auditorium, and several conference rooms on each floor.
    • Machine room with 48,000 square feet of unobstructed computer room floor
    • 30 megawatts machine power capacity
    • Mechanical cooling system with cooling towers boasting total capacity of 12,600 gallons per minute, a chiller plant with total capacity of 7,200 tons, and air handlers with a total capacity of 2,720,000 cubic feet per minute
    • 3,600-gallon-per-minute, closed-loop, liquid-cooling system for Sequoia that can cool up to 9.6 megawatts.

  • LC's building 654, currently under construction, will comprise 6,000 sq/ft of computer floor space and be scalable up to 7.5 MW. B654 schematic drawing

  • Additional reading/viewing:

Machine Room Tours:

  • LLNL hosts can request tours of the B453 machine room for visitors and groups. Hosts are responsible for providing Administrative Escorts (AE) and ensuring AE policies/rules are followed.

  • Tour participants must be US citizens

  • For Livermore Computing Complex Building 453 tour information, please contact Lori McDowell (mcdowell6).

Machine Photos:

  • Photo collections of a number of LC systems, present and past, are available below. Note that both require authentication for viewing.
Click for larger image


Accounts


OCF (unclassified) Computer Account Requests:
  • LLNL Employees:
    • Log on to the LC Identity Management System https://lc-idm.llnl.gov/ to "Add OCF Computing Resource Account"
    • First time account request: will need to "Request a Special Purpose LC Username" before adding a compute resource account.

  • Collaborators:
    • Your LC Sponsor will need to use the LC Identity Management System https://lc-idm.llnl.gov/ to "Add OCF Computing Resource Account" for you.
    • First time account request: your sponsor will need to "Request a Special Purpose LC Username" before adding a compute resource account.

  • ASC Alliances, LANL, Sandia and other DOE sites:
    • Go to the SARAPE website http://sarape.sandia.gov/ and select the appropriate login option.
    • For new accounts, select "Obtain New Cyber Access/Accounts" and complete the online form.

SCF (classified) Computer Account Requests:

  • LLNL Employees and on-site Collaborators:
    • Must have a Q clearance
    • Collaborators must be at LLNL. Offsite Collaborator accounts are not provided.
    • Log on to the LC Identity Management System https://lc-idm.llnl.gov/ to "Add SCF Computing Resource Account"
    • First time account request: will need to "Request a Special Purpose LC Username" before adding a compute resource account.

  • ASC Alliances, LANL, Sandia and other DOE sites:
    • Must have a Q clearance
    • Go to the SARAPE website http://sarape.sandia.gov/ and select the appropriate login option.
    • For new accounts, select "Obtain New Cyber Access/Accounts" and complete the online form.

What Happens Next?

  • You will receive email instructions from the LC Hotline and LLNL's institutional EZid Identity Management System (separate from LC)

  • Necessary authorizations are obtained

  • Foreign Nationals requests require additional processing and take longer to complete

  • OTP Tokens / CRYPTOCards:
    • For OCF accounts, you will receive via US mail, an RSA One-time Password (OTP) token. Instructions on how to activate and use this token are included with your account notification email.
    • For OCF RZ accounts, you will also receive a CRYPTOCard.
    • For SCF accounts, you will be asked to visit the LC Hotline to obtain your OTP token and setup your PIN.

  • Required training: All account requests require completion of online training before they are activated.

  • Annual Renewal: Accounts are subject to annual revalidations and completion of online training.

  • Virtual Private Network (VPN) Account: for remote access may also be required. Discussed later under Remote Access Services.

Additional Information:

  • Additional information and instructions are available on the LC Home Page at computing.llnl.gov/accounts.

  • Users can also their manage their accounts through LC's Identity Management System and/or SARAPE (as relevant).

  • Questions? Contact the LC Hotline: (925) 422-4533 lc-support@llnl.gov
Click for larger image


Accessing LC Systems

Passwords and Authentication

One-time Passwords (OTP):

OCF Collaboration Zone (CZ) or Restricted Zone (RZ)?



Accessing LC Systems

Access Methods

SSH Required:
OCF Access - Collaboration Zone (CZ):
  • Simply use SSH to the cluster login - for example:

    ssh sierra.llnl.gov

  • Authenticate with your LC username and PIN + OTP RSA token

  • Works the same from inside or outside the LLNL network

  • LANL / Sandia:
    • Begin on a LANL/Sandia iHPC login node. For example:
      ihpc-login.sandia.gov
      ihpc-gate1.lanl.gov
    • Then use the ssh -l LCusername command to login, where LCusername is your LC username. No password required. For example:

      ssh -l joeuser cab.llnl.gov



OCF Access - Restricted Zone (RZ):
  • From inside LLNL:
    • You must be inside the RZ or LLNL institutional network. Access from the CZ is not permitted.
    • SSH to the gateway machine rzgw.llnl.gov
    • Authenticate with your LC username and cryptocard PIN + password
    • Then SSH to the desired RZ cluster
    • Authenticate with your LC username and PIN + OTP RSA token

  • From outside LLNL:
    • Must have an Remote Access Service account (discussed later) already setup - usually VPN.
    • First, start up and authenticate to your Remote Access Service account. If you are using LLNL's VPN, use your LLNL OUN (Official User Name) and your PIN + OTP RSA token
    • SSH to the gateway machine rzgw.llnl.gov
    • Authenticate with your LC username and cryptocard PIN + password
    • Then SSH to the cluster login as usual
    • Authenticate with your LC username and PIN + OTP RSA token

  • LANL / Sandia:
    • Begin on a LANL/Sandia iHPC login node:
      • Sandia - start from ihpc.sandia.gov
      • LANL - start from ihpc-gate1.lanl.gov
    • Then use the ssh -l LCusername command to login to the RZ gateway node, where LCusername is your LC username. For example:

      ssh -l joeuser rzgw.llnl.gov

    • Authenticate with your LC cryptocard PIN + password
    • On rzgw: kinit sandia-username@dce.sandia.gov or kinit lanl-username@lanl.gov
    • Enter Sandia/LANL kerberos password
    • Then ssh to desired RZ machine. No password required.




SCF Access:
  • Within the SCF at LLNL:
    • Simply ssh to the cluster login and authenticate with your PIN + OTP RSA token

  • LANL / Sandia:
    • Authenticate on a designated LANL/Sandia machine locally using the kinit -f command.
      Be sure to specify the -f option for a forwardable credential.
    • For LANL only: connect to the LANL gateway machine: ssh red-wtrw
    • Then use the ssh -l LCusername command to login, where LCusername is your LC username. No password required. For example:

      ssh -l joeuser seq.llnl.gov

  • From other classified DOE sites over SecureNet:
    • Use SSH with your PIN + OTP RSA token
    • RSA/DSA key authentication is disabled


SSH Examples:

CZ and RZ Access Methods:

Web Page Access:



Accessing LC Systems

Where to Login

Login Nodes

Cluster Login

Logging Into Compute Nodes:



Accessing LC Systems

Remote Access Services

Services Available:

For Help, Software Downloads and More Information:



Accessing LC Systems

A Few More Words About SSH

OpenSSH:

RSA/DSA Authentication (SSH Keys):

SSH Timeouts:

SSH and X11:

  • If you are logged into an LC cluster from your desktop, and are running applications that generate graphical displays, you will need to have X11 setup on your desktop.

  • Linux: automatic - nothing special needs to be done in most cases

  • Macs: you'll need X server software installed. XQuartz is commonly used (http://www.xquartz.org/).

  • Windows: you'll need X server software installed. LLNL provides X-Win32, which can be downloaded/installed from your desktop's LANDesk Management software. Xming is a popular, free X server available for non-LLNL systems.

  • Helpful Hints:

    • X-Win32 setup instructions for LLNL: https://computing.llnl.gov/?set=access&page=xwin32_ssh_setup

    • It's usually not necessary to define your DISPLAY variable in an SSH session between LC hosts. It should be picked up automatically.

    • Make sure your X server is setup to allow tunneling/forwarding of X11 connections BEFORE you connect to the LC host.

    • Often, you need to supply the -X or -Y flag to your ssh command to enable X11 forwarding.

    • May also try setting the two parameters below in your .ssh/config file:

      ForwardX11=yes
      ForwardX11Trusted=yes

    • Use the verbose option to troubleshoot problems:

      ssh -v [other options] [host]



Need SSH?

More Information:



Accessing LC Systems

SecureNet





File Systems

Home Directories and Login Files

Home Directories:

LC's Login Files:

Master Dot Files:

Architecture Specific Dot Files:

Operating System Specific Dot Files:

A Few Hints:

Need a New Copy?



File Systems

/usr/workspace Systems



File Systems

Temporary File Systems

Useful Commands:



File Systems

Parallel File Systems

Overview:

Linux Parallel File Systems - Lustre:

LC Parallel File Systems Summary:



File Systems

Archival HPSS Storage

Access Methods and Usage:

Additional Information:



File Systems

/usr/gapps, /usr/gdata File Systems

Overview:



File Systems

Quotas

Home Directories:

Exceeding quota:
  • Warning appears in login messages if usage over 90% quota
  • Heed quota warnings - risk of data loss if quota is exceeded!

Other File Systems:



File Systems

Purge Policies

Temporary files - don't forget:


File Systems

Backups

Online .snapshot directories

Livermore Computing System Backups

Archival HPSS Storage



File Systems

File Transfer and Sharing

File Transfer Tools:

  • There are a number of ways to transfer files - depending upon what you want to do.

  • hopper - A powerful, interactive, cross-platform tool that allows users to transfer and manipulate files and directories by means of a graphical user interface. Users can connect to and manage resources using most of the major file transfer protocols, including FTP, SFTP, SSH, NFT, and HTAR. See the hopper web pages (hopper computing.llnl.gov/resources/hopper), hopper man page or use the hopper -readme command for more information.

  • ftp - Is available for file transfer between LC machines. The ftp client at LC is an optimized parallel ftp implementation. It can be used to transfer files with machines outside LLNL if the command originates from an LLNL machine and the foreign host will permit it. FTP to LC machines from outside LLNL is not permitted unless the user is connected via an appropriate Remote Access service such as OTS or VPN. Documentation is available via the ftp man page or the FTP Usage Guide (computing.llnl.gov/LCdocs/ftp)

  • scp - (secure copy) is available on all LC machines. Example:

    scp thisfile user@host2:thatfile

  • sftp - Performs ftp-like operations over encrypted ssh.

  • MyLC - Livermore Computing's user portal provides a mechanism for transferring files to/from your desktop machine and your home directory on an LC machine. See the "utilities" tab. Available at mylc.llnl.gov

  • nft - (Network File Transfer) is LC's utility for persistent file transfer with job tracking. This is a command line utility that assumes transfers with storage and has a specific syntax. Documentation is available via its man page or the NFT Reference Manual (computing.llnl.gov/LCdocs/nft).

  • htar - Is highly optimized for creation of archive files directly into HPSS, without having to go through the intermediate step of first creating the archive file on local disk storage, and then copying the archive file to HPSS via some other process such as ftp. The program uses multiple threads and a sophisticated buffering scheme in order to package member files into in-memory buffers, while making use of the high-speed network striping capabilities of HPSS. Syntax resembles that of the UNIX tar command. Documentation is available via its man page or the HTAR Reference Manual (computing.llnl.gov/LCdocs/htar).

  • hsi - Hierarchical Storage Interface. HSI is a utility that communicates with HPSS via a user- friendly interface that makes it easy to transfer files and manipulate files and directories using familiar UNIX-style commands. HSI supports recursion for most commands as well as CSH-style support for wildcard patterns and interactive command line and history mechanisms. Documentation is available via its man page or the HSI website (http://www.mgleicher.us/).

  • Tri-lab high bandwidth file transfers over SecureNet:
    • All three Labs support wrapper scripts for enhanced data transfer between sites - classified side only.
    • Three different protocols can be used: hsi, htar and pftp.
    • Transfers can be from host to storage or host to host
    • Commands are given names that are self-explanatory - see the accompanying image at right.

       2smss connects to SNL's HPSS
       2lynx connects to SNL's global file systems
       2calynx connects to SNL-CA calynx-s
       2lanl connects to LANL's HPSS
       2fta connects to LANL's rftas
       2llnl connects to LLNL's HPSS
       2slic connects to LLNL's cslic
      Note: SNL CA users store at SNL NM
      File Transfer Hosts: SNL = Lynx; LLNL = cslic; LANL = rfta
      

    • At LLNL, these scripts are located in /usr/local/bin
    • For additional information please see https://aces.sandia.gov/hpss_info (requires Sandia authentication)

File Sharing Rules:

  • User directories are required to be accessible to the user only. No group or world sharing is permitted without approval.
    • Applies to all home and all tmp directories
    • World (other) permissions will be removed automatically

  • Exceptions must be approved by your Associate Director

Hopper
Click for larger image


MyLC
Click for larger image


Tri-lab SCF File Transfers
Click for larger image

Give and Take Utilities:

Anonymous FTP server:



File Systems

File Interchange System (FIS)

Usage:

Caveats:



System Status and Configuration Information


System Configuration Information:

 The best place to go for static machine configuration information for LC's systems is:

computing.llnl.gov ==> Computing Resources ==> Hardware

Machine Status:

File System Status:

Examples:



Exercise 1

Logging In, Basic Configuration and File Systems Information

Overview:
  • Login to an LC cluster with X11 forwarding enabled
  • Test X11
  • Identify and SSH to other login nodes
  • Familiarize yourself with the cluster's configuration
  • Try the mxterm utility to access compute nodes
  • Learn where/how to obtain hardware, OS and other configuration information for LC clusters
  • View system status information

GO TO THE EXERCISE HERE



Software and Development Environment Overview


Development Environment Group (DEG):

TOSS Operating System:

Software Lists, Documentation and Downloads:

Dotkit:

Modules:

Some LC software applications use Modules instead of, or in addition to Dotkit.

Confluence Wiki, STASH, JIRA:



Compilers

General Information

Available Compilers and Invocation Commands:

Compiler Versions and Defaults:

Compiler Options:

Compiler Documentation and Man Pages:

Optimizations:

Floating-point Exceptions:

Precision, Performance and IEEE 754 Compliance:

Mixing C and Fortran:

Large Static Data:



Debuggers

Debuggers

 This section only touches on selected highlights. For more information users will definitely need to consult the relevant documentation mentioned below. Also, please consult the "Supported Software and Computing Tools" web page located at computing.llnl.gov/code/content/software_tools.php.

TotalView:
  • TotalView is probably the most widely used debugger for parallel programs. It can be used with C/C++ and Fortran programs and supports all common forms of parallelism, including pthreads, openMP, MPI, accelerators and GPUs.

  • Starting TotalView for serial codes: simply issue the command:

      totalview myprog

  • Starting TotalView for interactive parallel jobs:

    • Some special command line options are required to run a parallel job through TotalView under SLURM. You need to run srun under TotalView, and then specify the -a flag followed by 1)srun options, 2)your program, and 3)your program flags (in that order). The general syntax is:

      totalview srun -a -n #processes -p pdebug myprog [prog args]

    • To debug an already running interactive parallel job, simply issue the totalview command and then attach to the srun process that started the job.

    • Debugging batch jobs is covered in LC's TotalView tutorial and in the "Debugging in Batch" section below.

  • Documentation:
Small TotalView screen shot
DDT:
  • DDT stands for "Distributed Debugging Tool", a product of Allinea Software Ltd.

  • DDT is a comprehensive graphical debugger designed specifically for debugging complex parallel codes. It is supported on a variety of platforms for C/C++ and Fortran. It is able to be used to debug multi-process MPI programs, and multi-threaded programs, including OpenMP.

  • Currently, LC has a limited number of fixed and floating licenses for OCF and SCF Linux machines.

  • Usage information: see LC's DDT Quick Start information located at: https://computing.llnl.gov/?set=code&page=ddt

  • Documentation:
Small ddt screen shot

STAT - Stack Trace Analysis Tool: STAT

Debugging in Batch: mxterm:

Other Debuggers:

A Few Additional Useful Debugging Hints:



Performance Analysis Tools


We Need a Book!

Memory Correctness Tools:

Profiling, Tracing and Performance Analysis:

Beyond LC:



Graphics Software and Resources


Graphics Software:

Consulting:

Video Production:

Visualization Machine Resources:

PowerWalls:

Contacts & More Information:



Other Software and Tools


Available Through LC/LLNL Sources:

User Supported Software:



Running Jobs

Where to Run?

  This section only provides a general overview on running jobs on LC systems. Details associated with running jobs are covered in depth in other LC tutorials at computing.llnl.gov/tutorials (Moab, MPI, BG/Q, OpenMP, Pthreads, etc.)

Determining Your Job's Requirements:

Getting Machine Configuration Information:

Job Limits:

Accounts and Banks:

Serial vs Parallel:

Dedicated Application Time (DAT) / Expedited Priority Runs:



Running Jobs

Batch Versus Interactive

Interactive Jobs (pdebug):

Batch Jobs (pbatch):

 This section only provides a quick summary of batch usage on LC's clusters. For details, see the Moab and SLURM Tutorial.



Running Jobs

Starting Jobs - srun

The srun command:

srun options:



Running Jobs

Interacting With Jobs

 This section only provides a quick summary of commands used to interact with jobs. For additional information, see the Moab and SLURM Tutorial.

Displaying Job Information:

Holding / Releasing Jobs:

Modifying Jobs:

Terminating / Canceling Jobs:



Running Jobs

Other Topics of Interest

Optimizing Core Usage:

Diskless Nodes:

Process/Thread Binding to Cores:

Vectorization:

Hyper-threading:

Clusters Without an Interconnect (serial and single-node jobs):



Batch Systems


Moab and SLURM Schedulers



Miscellaneous Topics

Big Data at LC




Green Data Oasis (GDO)



Security Reminders

Just a Few Reminders... For the Full Story:

Where to Get Information & Help


LC Hotline:

LC Users Home Page: computing.llnl.gov

  • computing.llnl.gov: LC maintains extensive web documentation for all systems and also for computing in the LC environment:

  • A few highlights:
    • Important Notices and News appear on the opening page, as does access to Technical Bulletins.
    • Machine Status shows current OCF machines status with links to detailed information such as MOTD, currently running jobs, configuration, announcements, etc.
    • Getting Started quick start for new users
    • Accounts - how to request an account; forms
    • Access Information - how to access and login to LC systems
    • Code Development - including compilers, tools, debuggers
    • Computing Resources - complete list with details for all LC systems
    • Running Jobs - a range of topics
    • Documentation manuals for a wide range of topics
    • Training for online tutorials and workshops
    • Search on a keyword or phrase; different scopes
    • Site Index covers all topics

  • Some web pages are password protected. If prompted to enter a userid/password, use your OTP login.

  • Some web pages may only be accessed from LLNL machines or by using one of the LC Remote Access Services covered previously.

Lorenz User Dashboard: mylc.llnl.gov

  • Provides a wealth of real-time information in a user-friendly dashboard

  • Simply enter "mylc" into your browser's address bar. The actual URL is: https://lc.llnl.gov/lorenz/mylc/mylc.cgi

  • Click on the screenshot at right to see a larger version. Note: If your browser squeezes the large image into a single window, try zooming to get more detail.

News Items:

  • News postings on each LC System:
    • Unread news items appear with login messages
    • news -l - list all news items
    • news -a - display content of all news messages
    • news -n - lists unread messages
    • news -s - shows number of unread items
    • news item - shows specified news item
    • You can also list/read the files in /var/news on any system. This is useful when your searching for a topic you've already read and can't remember the news item name. You can also "grep" on these files.

  • Also accessible from computing.llnl.gov and Lorenz.

Machine Email Lists:

  • Machine status email lists exist for all LC machines

  • Provide important, timely information not necessarily announced elsewhere

  • ocf-status@llnl.gov and scf-status@llnl.gov are general lists for all users

  • Plus each machine has its own list, for example: zin-status@llnl.gov.

  • The LC Hotline initially populates a list with subscribers, but you can subscribe/unsubscribe yourself anytime using the listserv.llnl.gov website.

Login Banner:

  • Login banner / MOTD may be very important!
    • News topics for LC, for the login system
    • Some configuration information
    • Useful references and contact information
    • System status information
    • Quota and password expiration warnings also appear when you login

Miscellaneous Documentation:

  • /usr/local/doc - archive of files covering a wide range of topics. Note that some files may be out of date.

  • /gadmin/docs - another archive of files covering a wide range of topics; postscript versions of LC manuals

LC User Meeting:

  • When held, is usually scheduled for the first Tuesday of the month at 9:30 am

  • Building 132 Auditorium (or as otherwise announced)

  • Agenda and viewgraphs on LC Home Page (computing.llnl.gov) See "Documentation" and look for "User Meeting Viewgraphs". Note that these are LLNL internal web pages.

Click for larger images



Exercise 2

Compiling, Running, Job and System Status Information

Overview:
  • Get information about running and queued jobs
  • Get compiler information
  • Compile and run serial programs
  • Compile and run parallel MPI and OpenMP programs, both interactively and in batch
  • Check hyper-threading
  • Get online system status information (and more)

GO TO THE EXERCISE HERE




This completes the tutorial.

      Please complete the online evaluation form.

Where would you like to go now?





Author: Blaise Barney, Livermore Computing. Always interested in comments/corrections!