LLNL has a fully functioning GPU supercomputing environment featuring
some of the latest technology from NVIDIA. NVIDIA's Compute Unified
Device Architecture (CUDA) technology is a hardware specification for
its General Purpose Graphics Processing Units (GPGPUs). GPGPUs are
often called "GPUs" for historical reasons and for brevity.
GPUs are Single Instruction, Multiple Thread (SIMT) devices, as
distinct from Single Instruction Multiple Data (SIMD)
GPU computing is fundamentally different than CPU computing. GPU
computing is sometimes called "stream processing" and focuses on very
high numbers of floating point computations compared to the number of
memory accesses. Not every computer program's core algorithm is
amenable to stream computing, but those that are sometimes see factors
of 10x, 20x or 100x speedups even compared to a modern multi-core CPU.
Comparing to a threaded, well-tuned CPU code, good code on GPUs can
still outperform CPUs handily, but not by more than a factor of five.
Fortunately, optimizing for GPUs often results in a CPU performance
jump as well.
Machines and Versions:
page for software versions, and as well as the OCF Computing Resources
page for hardware information. There is a GPU-enabled cluster for each of the CZ, RZ and SCF networks. On the CZ, the surface cluster has two Tesla K40M
cards per node. Each K40M card has 2880 cores and 12 GB memory. On RZ, the rzhasgpu cluster has two
Tesla K80 cards
per node. Each K80 card has 12 GB memory and 2496 cores. For the SCF, on max.llnl.gov we
have Tesla K20Xm Kepler cards. K20Xm cards have 6GB memory and 2688 cuda cores per card. To get complete GPU specifications for a particular batch node on a given cluster, run the following command on the node of interest. Remember that login nodes do not have GPUs installed on them!
Location: The CUDA toolkits are in /opt/cudatoolkit* on CUDA-enabled
Settings: None required beyond loading the cudatoolkit module.
The following command will load the CUDA 5.0 Toolkit into your
module load cudatoolkit/5.0
Note for ICC users: You must use ICC version 12.1.133 with CUDA
5.5; any other version will fail with the error "unsupported ICC
configuration! Only ICC 12.1 on Linux x86_64 is supported!". In fact
all 12.1 compilers are usable with CUDA 5.5, so a workaround is to use
your own version of the host_config.h header. Contact Rich Cook at
email@example.com for more information.
Note that compute architectures vary between clusters, so be sure to
compile for the right architecture using the -arch flag to nvcc. For example, rzhasgpu has Compute capability 3.7, while max is Compute 3.5. GPU
technology is available through the use of two different mechanisms:
- Some off-the-shelf, vendor-supplied or open source programs such as
Mathematica, Matlab, or visualization applications such as VisIt or
EnSight have been enhanced to use GPUs to perform some calculations
faster. See the documentation for your favorite software package for
- You, the user, can write software targeting the GPU using
CUDA C, PyCUDA, the Thrust
libraries, or various other CUDA-based programming APIs.
In either case, the resulting programs must be run on a node that has
available GPUs to take advantage of them. Login nodes on LC clusters do
not have GPUs available; you must reserve batch nodes to get GPUs. For
information about reserving batch nodes, contact the LC Hotline at
Livermore Computing has some in-house expertise in CUDA programming and
GPU usage and may be able to assist you in porting your scientific
codes and/or using GPU-enabled software in the LC environment.. As a
first step, please contact the firstname.lastname@example.org