Innovative HPC Architectures

To meet the ever-rising demand for compute cycles, high performance computing (HPC) centers have long counted on Moore’s Law, which roughly states that the number of transistors in an integrated circuit doubles about every two years. We have now entered a post-Moore’s-Law age, especially where compute cycles per dollar are concerned. However, “Livermore Computing (LC) continues to have high demand for compute cycles to meet the Lab’s mission-critical simulation needs,” according to LC Chief Technology Officer Bronis R. de Supinski.

With the heterogeneous central and graphics processing units (CPU–GPU) node architecture of Sierra, LC began to address this issue of how to accomplish more cost-effective science. Still, “Sierra’s architecture was mostly general purpose while artificial intelligence (AI) accelerators are specialized and therefore very efficient,” says de Supinski. “El Capitan has moved us in one direction for increased efficiency by providing equal access to the same physical memory. However, other directions are available and we are actively exploring them.”

Toward Fully Heterogeneous Computing

LC’s Advanced Technology Office, which de Supinski leads, has sited multiple AI accelerators at LLNL since 2020. The first was the Cerebras wafer-scale AI engine, originally attached to Lassen, Sierra’s smaller, unclassified companion system. Soon after, the team integrated an AI accelerator from SambaNova Systems into the Corona supercomputing cluster. Since then, LLNL has invested further by installing 3 SambaNova systems and an upgraded Cerebras system. LLNL researchers use these solutions to expand the combination of HPC and AI.

“Our strategy is demonstrating that this approach can provide more cost-efficient solutions for the workloads of the future,” says de Supinski.

AI is computationally intensive, making it well suited to fully heterogeneous systems. “We can offload certain computations to AI or other accelerators while the regular HPC computations can carry on at the same time,” adds LLNL AI expert Brian Van Essen.

Additionally, AI accelerators like Cerebras and SambaNova are more efficient and quicker at handling the types of computations necessary for LLNL’s complex scientific problems, such as mesh management and inertial confinement fusion. The integrations will accelerate LC workflows, making them faster and more accurate while improving user productivity.

Samba Nova systems in B453 — Livermore Computing is home to multiple classified, restricted zone, and unclassified SambaNova AI accelerators.

Solving the Stranded Resource Problem

On traditional supercomputers, different parts of a single run of a program—called a job—have different computational needs. This process can leave resources on the system “stranded,” or underutilized, especially during long or complex jobs.

LLNL ATO member Todd Gamblin explains, “Different LLNL workflows have different ratios of needs, and to best utilize the hardware, we need to match jobs with the nodes that best fit their workload. Accelerators like Cerebras and SambaNova are particularly good at making AI jobs fast. If we can match jobs in a workflow with the resources they’re best suited for, we can improve resource utilization.”

This work is a large step toward the end goal of production usage: On a system with an AI accelerator, a developer submits a complex HPC job to resource management software such as Flux. The job contains a set of parameters that designate a portion of the work as suitable for AI acceleration. The HPC portion of the job runs in the traditional way on the CPUs and GPUs while the AI element is sent to the accelerator for specialized treatment. Results or even improved parameters are sent from the AI accelerator back to the main job. This information aids in creating more optimal parameters for future runs, allowing each piece of the heterogeneous system to run separately, but in concert, for a more efficient job.

As de Supinski puts it, “Instead of computing every step along the way, you are jumping along the timeline. A single run might take longer, but the total cost is decreased because we get better results from the approximations that these technologies allow us to make. The total time to solution is decreased by working smarter, not harder.”

Beating Moore’s Law

Initial results on our various AI accelerators show potential wins in terms of reduced latency for inference as well as performance improvement per transistor over CPUs and GPUs alone for scientific AI applications.

El Capitan has added heterogeneity relative to Sierra. HPE’s near-node storage solution, which we call Rabbits, has been deployed throughout the system with one Rabbit for every compute chassis, was a key factor in the vendor selection. Rabbits enable more efficient defensive input–output and reduce system interference, which is especially important for the complex workflows that will continue to advance LLNL computational science.

The Cutting Edge

“The closest anyone else gets to the level of full heterogeneity we have in Livermore Computing is cloud providers,” concludes de Supinski, “but their systems do not have the low latency required for our workloads. So, we are pushing out and they are pushing in, and we’ll meet in the middle. The cutting-edge technology that LLNL and our sponsors are investing in is shaping the future. Someday, technologies such as AI accelerators and near-node per-chassis storage will be standard, but for now, making these innovations a reality is the focus of our biggest systems.”