This article is part of a series about Livermore Computing’s efforts to stand up the NNSA’s first exascale supercomputer. El Capitan will come online in 2024 with the processing power of more than 2 exaflops, or 2 quintillion (1018) calculations per second. The system will be used for predictive modeling and simulation in support of the stockpile stewardship program.
Previous: A framework for complex workflows | Next: Packaging for everyone
Referring to El Capitan as an Advanced Technology System (ATS) almost sounds like an understatement. Livermore’s ATS roster evolves as high performance computing (HPC) architectures and capabilities evolve. This supercomputer lineup includes the petascale Sierra and Lassen machines along with El Capitan, its unclassified counterparts Tuolumne and RZAdams, and the early access systems already installed at LLNL. ATS machines are designed to take advantage of the industry’s most cutting-edge components and attributes: processors, operating system, storage and memory, energy efficiency, and so on.
Perhaps one of the least publicized advancements in the exascale era is compiler technology. Compilers manage the incredibly tricky translation of human-programmable source code into machine-readable code (think 1s and 0s), and they optimize the latter so the former can run more quickly. As programming languages evolve, so too must compilers.
Progress is a double-edged sword, however. Computer scientist John Gyllenhaal, who leads Livermore’s ATS compiler team, explains, “Programming languages aren’t static. New standards, such as C++20 or OpenMP 5.2, add significant new features that make languages more powerful, expressive, and maintainable. But these standards also create a huge amount of work for compiler writers and support staff. There are a lot of opportunities for things to go wrong in a compiler.” Gyllenhaal's team works closely with the vendors who develop and optimize the compilers for various ATS architectures, including El Capitan.
Most HPC vendors’ compilers are based on LLVM, an open-source C++ compiler infrastructure created two decades ago at the University of Illinois. LLVM’s widespread adoption promotes sustainable, distributed development across the organizations that use it, including Livermore. This community effort frees vendors to develop proprietary optimizations on top of LLVM’s baseline code.
According to Gyllenhaal, this consolidation has tradeoffs. He states, “On all of our architectures, our historical aim was to accommodate at least two robust vendor-supported compilers, allowing code teams to switch between them if they had to wait for a bug fix on one. With most compilers now being LLVM-based, this approach has been slightly less effective. We’ve had to account for the LLVM baseline while addressing issues from vendor-specific optimizations.”
An additional complexity facing the ATS compiler team is that, as Gyllenhaal states, “Large-scale codes often break compilers.” Complex multiphysics codes, like those that will run on El Capitan in support of stockpile stewardship, rely on compilers in order to perform successfully in different HPC environments. Compiler-related bugs must be carefully managed on classified systems, especially when involving hardware vendors in the solutions. Furthermore, ATS compilers must work for other user groups and their codes—for instance, academic and NNSA partners.
As code teams begin to test drive Livermore’s exascale systems, Gyllenhaal points to the coordinated compiler and runtime hardening efforts of his team, HPC vendor partners, and the Tri-Lab Center of Excellence (see the article Collaboration is key in this series). “The primary vendor compilers provided for El Capitan’s early access systems were new to us. Together we’ve found and fixed a number of issues,” he says. “If our codes compile on the early access systems, we expect they’ll compile on El Capitan.”
Previous: A framework for complex workflows | Next: Packaging for everyone
—Holly Auten & Meg Epperly
 
        