|Blaise Barney, Lawrence Livermore National Laboratory
Introduction to Parallel Computing (EC3500)
This is the first tutorial in the "Livermore Computing Getting Started" workshop. It is intended to provide only a very quick overview of the extensive and broad topic of Parallel Computing, as a lead-in for the tutorials that follow it. As such, it covers just the very basics of parallel computing, and is intended for someone who is just becoming acquainted with the subject and who is planning to attend one or more of the other tutorials in this workshop. It is not intended to cover Parallel Programming in depth, as this would require significantly more time. The tutorial begins with a discussion on parallel computing - what it is and how it's used, followed by a discussion on concepts and terminology associated with parallel computing. The topics of parallel memory architectures and programming models are then explored. These topics are followed by a series of practical discussions on a number of the complex issues related to designing and running parallel programs. The tutorial concludes with several examples of how to parallelize simple serial programs.
Livermore Computing Resources and Environment (EC3501)
This is the second tutorial in the "Livermore Computing Getting Started" workshop. It provides an overview of Livermore Computing's (LC) supercomputing resources and how to effectively use them. As such, it is definitely intended as a "getting started" document for new users or for those who want to know "in a nutshell" what supercomputing at LC is all about from a practical user's perspective. It is also intended to provide essential, practical information for attendees planning to attend the other tutorials in this workshop.
Linux Clusters Overview (EC3516)
A wide variety of topics are covered in what is hopefully, a logical progression, starting with a description of the LC organization, a summary of the available supercomputing hardware resources, how to obtain an account and how to access LC systems. Important aspects concerning the user environment are then addressed, such as the user's home directory, various files and file systems, how to transfer/share files, quotas, archival storage and getting system status/configuration information. A brief description of the software development environment (compilers, debuggers, and performance tools), a summary of video and graphics services, and the basics of how to run jobs follow. Several miscellaneous topics are discussed. Finally, this tutorial concludes with a discussion on where to obtain more information and help. Note: This tutorial only provides an overview of using LC's Moab/SLURM batch systems - these topics are covered in the EC4045 "Moab and SLURM" tutorial.
Level/Prerequisites: This tutorial is geared to new users of LC systems and might actually be considered a prerequisite for using LC systems and attending other tutorials that describe parallel programming on LC systems in more detail.
This tutorial is intended to be an introduction to using LC's Linux
clusters. It begins by providing a brief historical background of Linux
clusters at LC, noting their success and adoption as a production, high
performance computing platform. The primary hardware components of an
LC Linux cluster are then presented, including the various types of
nodes, processors and switch interconnects. The detailed hardware
configuration for each of LC's production Linux clusters completes the
hardware related information.
Slurm and Moab (EC4045)
After covering the hardware related topics, software topics are
discussed, including the LC development environment, compilers, and how
to run both batch and interactive parallel jobs. Important issues in
each of these areas are noted. Available debuggers and performance
related tools/topics are briefly discussed, however detailed usage is
beyond the scope of this tutorial. A lab exercise using one of LC's
Linux clusters follows the presentation.
Level/Prerequisites: A basic understanding of parallel programming in C or Fortran is required. The material covered by the following tutorials would also be helpful:
EC3501: Livermore Computing Resources and Environment
EC4045: Moab and SLURM
Slurm and Moab are two workload manager systems used to schedule and manage user jobs run on Livermore Computing (LC) clusters. This tutorial presents the essentials for using Slurm and Moab on LC platforms. It begins with an overview of workload managers, followed by a discussion on some basic concepts for workload managers, such as the definition of a job, queues and queue limits, banks and fair-share job scheduling. Basic workload manager functions are covered next, including how to build batch scripts, submit, monitor, change, hold/release, and cancel jobs. Dependent jobs, bank usage information, output files, determining when a job will expire, and running in standby round out the basic workload manager functions. Other topics covered include displaying configuration and accounting information, a discussion on parallel jobs and the srun command, and running on serial clusers. This tutorial includes both C and Fortran example codes and lab exercises.
Message Passing Interface (MPI) (EC3505)
Level/Prerequisites: The material covered in EC3501: Livermore Computing Resources and Environment
would be helpful.
The Message Passing Interface Standard (MPI) is a message passing library
standard based on the consensus of the MPI Forum, which has over 40
participating organizations, including vendors, researchers, software library
developers, and users. The goal of the Message Passing Interface is to
establish a portable, efficient, and flexible standard for message passing
that will be widely used for writing message passing programs. As such, MPI
is the first standardized, vendor independent, message passing library. The
advantages of developing message passing software using MPI closely match the
design goals of portability, efficiency, and flexibility.
POSIX Threads Programming (EC3506)
This tutorial will provide a means for those interested in exploring these
advantages to become familiar with MPI and also to learn the basics of
developing MPI programs. The primary topics that are presented focus on those
which are the most useful for beginning MPI programmers. The tutorial begins
with an introduction, background, and basic information for getting started
with MPI. This is followed by a detailed look at the MPI routines that are
most useful for new MPI programmers, including MPI Environment Management,
Point to Point Communications, and Collective Communications routines.
Numerous examples in both C and Fortran are provided, as well as a lab
Level/Prerequisites: This tutorial is ideal for those who are new to parallel programming with MPI. A basic understanding of parallel programming in C or Fortran is required. For those who are unfamiliar with Parallel Programming in general, the material covered in EC3500: Introduction to Parallel Computing would be helpful.
In shared memory multiprocessor architectures, such as SMPs, threads can be
used to implement parallelism. Historically, hardware vendors have
implemented their own proprietary versions of threads, making portability a
concern for software developers. For UNIX systems, a standardized C
language threads programming interface has been specified by the IEEE
POSIX 1003.1c standard. Implementations that adhere to this standard are
referred to as POSIX threads, or Pthreads.
The tutorial begins with an introduction to concepts, motivations, and design
considerations for using Pthreads. Each of the three major classes of
routines in the Pthreads API are then covered: Thread Management, Mutex
Variables, and Condition Variables. Example codes are used throughout to
demonstrate how to use most of the Pthreads routines needed by a new Pthreads
programmer. The tutorial concludes with a discussion and examples of how to
develop hybrid MPI/Pthreads programs in an IBM SMP environment. A lab
exercise, with numerous example codes (C Language) is also included.
Level/Prerequisites: This tutorial is ideal for those who are new to parallel programming with pthreads. A basic understanding of parallel programming in C is required. For those who are unfamiliar with Parallel Programming in general, the material covered in EC3500: Introduction to Parallel Computing would be helpful.
OpenMP is an Application Program Interface (API), jointly defined by a group
of major computer hardware and software vendors. OpenMP provides a portable,
scalable model for developers of shared memory parallel applications. The API
supports C/C++ and Fortran on wide variety of architectures. This tutorial covers most of the major features of OpenMP, including its
various constructs and directives for specifying parallel regions, work
sharing, synchronization and data environment. Runtime library functions
and environment variables are also covered. This tutorial includes both C
and Fortran example codes and a lab exercise.
TotalView Debugger (EC3508)
Level/Prerequisites: This tutorial is ideal for those who are new to parallel programming with OpenMP. A basic understanding of parallel programming in C or Fortran is required. For those who are unfamiliar with Parallel Programming in general, the material covered in EC3500: Introduction to Parallel Computing would be helpful.
TotalView is a sophisticated and powerful tool used for debugging and analyzing both serial and parallel programs. TotalView provides source level debugging for serial, parallel, multi-process, multi-threaded, accelerator/GPU and hybrid applications written in C/C++ and Fortran. Most HPC platforms and systems are supported. Both a graphical user interface and command line interface are provided. Advanced, dynamic memory debugging tools and the ability to perform "replay" debugging are two additional features. TotalView has been selected as the DOE ASC Program's debugger of choice for its HPC platforms.
Using the Sequoia and Vulcan BG/Q Systems
This tutorial has three parts, each of which includes a lab exercise. Part 1 begins with an overview of TotalView and then provides detailed instructions on how to set up and use its basic functions. Part 2 continues by introducing a number of new functions and also providing a more in-depth look at some of the basic functions. Part 3 covers parallel debugging, including threads, MPI, OpenMP and hybrid programs. Part 3 concludes with a discussion on debugging in batch mode.
Level/Prerequisites: This tutorial is intended for those who are new to TotalView. A basic understanding of parallel programming in C or Fortran is required. The material covered in the following tutorials would also be beneficial for those who are unfamiliar with parallel programming in MPI, OpenMP and/or POSIX threads:
EC3506: POSIX Threads
This tutorial is intended for users of Livermore Computing's Sequoia BlueGene/Q systems. It begins with a brief history leading up to the BG/Q architecture. Configuration information for the LC's BG/Q systems is presented, followed by detailed information on the BG/Q hardware architecture, including the PowerPC A2 processor, quad FPU, compute, I/O, login and service nodes, midplanes, racks and the 5D Torus network. Topics relating to the software development environment are covered, followed by detailed usage information for BG/Q compilers, MPI, OpenMP and Pthreads. Math libraries, environment variables, transactional memory, speculative execution, system configuration information, and specifics on running both batch and interactive jobs are presented. The tutorial concludes with a discussion on BG/Q debugging and performance analysis tools.
Level/Prerequisites: Intended for those who are new to developing parallel programs in the IBM BG/Q environment. A basic understanding of parallel programming in C or Fortran is required. Familiarity with MPI and OpenMP is desirable. The material covered by EC3501 - Livermore Computing Resources and Environment would also be useful.