Overview
LLNL has a fully functioning GPU supercomputing environment featuring some of the latest technology from NVIDIA. NVIDIA's Compute Unified Device Architecture (CUDA) technology is a hardware specification for its General Purpose Graphics Processing Units (GPGPUs). GPGPUs are often called "GPUs" for historical reasons and for brevity.
GPUs are Single Instruction, Multiple Thread (SIMT) devices, as distinct from Single Instruction Multiple Data (SIMD) architectures.
GPU computing is fundamentally different than CPU computing. GPU computing is sometimes called "stream processing" and focuses on very high numbers of floating point computations compared to the number of memory accesses. Not every computer program's core algorithm is amenable to stream computing, but those that are sometimes see factors of 10x, 20x or 100x speedups even compared to a modern multi-core CPU. Comparing to a threaded, well-tuned CPU code, good code on GPUs can still outperform CPUs handily, but not by more than a factor of five. Fortunately, optimizing for GPUs often results in a CPU performance jump as well.
Environment
Machines and Versions:
See graphics software page for software versions, and as well as the OCF Computing Resources page for hardware information. There is a GPU-enabled cluster for each of the CZ, RZ and SCF networks. On the CZ, the surface cluster has two Tesla K40M cards per node. Each K40M card has 2880 cores and 12 GB memory. On RZ, the rzhasgpu cluster has two Tesla K80 cards per node. Each K80 card has 12 GB memory and 2496 cores. For the SCF, on max.llnl.gov we have Tesla K20Xm Kepler cards. K20Xm cards have 6GB memory and 2688 cuda cores per card. To get complete GPU specifications for a particular batch node on a given cluster, run the following command on the node of interest. Remember that login nodes do not have GPUs installed on them!
/usr/local/bin/deviceQuery
Location: The CUDA toolkits are in /opt/cudatoolkit* on CUDA-enabled machines.
Settings: None required beyond loading the cudatoolkit module. The following command will load the CUDA 5.0 Toolkit into your environment.
module load cudatoolkit/5.0
Note for ICC users: You must use ICC version 12.1.133 with CUDA 5.5; any other version will fail with the error "unsupported ICC configuration! Only ICC 12.1 on Linux x86_64 is supported!". In fact all 12.1 compilers are usable with CUDA 5.5, so a workaround is to use your own version of the host_config.h header. Contact Rich Cook at lc-graphics@llnl.gov for more information.
Usage
Note that compute architectures vary between clusters, so be sure to compile for the right architecture using the -arch flag to nvcc. For example, rzhasgpu has Compute capability 3.7, while max is Compute 3.5. GPU technology is available through the use of two different mechanisms:
  1. Some off-the-shelf, vendor-supplied or open source programs such as Mathematica, Matlab, or visualization applications such as VisIt or EnSight have been enhanced to use GPUs to perform some calculations faster. See the documentation for your favorite software package for more informaton.
  2. You, the user, can write software targeting the GPU using OpenCL, OpenACC, CUDA C, PyCUDA, the Thrust libraries, or various other CUDA-based programming APIs.
In either case, the resulting programs must be run on a node that has available GPUs to take advantage of them. Login nodes on LC clusters do not have GPUs available; you must reserve batch nodes to get GPUs. For information about reserving batch nodes, contact the LC Hotline at lc-hotline@llnl.gov, (925) 422-4531.

Help
Livermore Computing has some in-house expertise in CUDA programming and GPU usage and may be able to assist you in porting your scientific codes and/or using GPU-enabled software in the LC environment.. As a first step, please contact the lc-hotline@llnl.gov, (925) 422-4531.

Download
There is a wealth of information at the NVIDIA GPU Computing Developer Home Page.

Links