The Department of Energy (DOE) has a long history of deploying leading-edge computing capability for science and national security. Going forward, DOE’s compelling science, energy assurance, and national security needs will require a thousand-fold increase in usable computing power, delivered as quickly and energy efficiently as possible. This will force fundamental changes in all computer components. Among those with extreme-scale computing needs, the collaborative and concurrent development of hardware, software, numerical methods, algorithms, and applications is widely considered to be a necessary step for achieving a usable exascale-class system.
Co-design is about where the state of computing is going, rather than just focusing on creating one specific machine.
—Rob Neely, Computing Associate Division Leader
The organizing principle for this type of coordinated development is co-design. Co-design draws on the combined expertise of vendors, hardware architects, system software developers, domain scientists, computer scientists, and applied mathematicians working together to make informed decisions about hardware and software components. To ensure that future architectures are well-suited for key DOE applications and that DOE scientific problems can take advantage of the emerging computer architectures, major DOE centers of computational science, including LLNL, are formally engaged in the co-design process.
Since the earliest days of supercomputing, LLNL has been known for fielding first-of-a-kind machines, most of which were rated among the fastest (or often the fastest) in the world at the time. Those machines were developed through a process very similar to the co-design processes proposed for the exascale era, and LLNL is actively pursuing a strategy to both leverage our co-design experience and to update it to meet the realities of today’s dynamic HPC environment, with huge lead-times between concept and realization.
The LLNL co-design strategy is strongly tied to the overall strategy of the National Nuclear Security Administration (NNSA) and DOE, and we are committed to establishing deep working relationships with the vendor community to help inform our own large application efforts and provide input into their design process. We are actively adapting our existing large applications using incremental improvements such as fine-grained threading, use of accelerators, and scaling to millions of nodes using message processing interface—with the 20-petaflop Sequoia BlueGene/Q machine providing a living laboratory for these explorations. We are also looking to the next generation of programming models, researching new algorithms, and evaluating the need to rewrite our major multiphysics applications from scratch, to address software architecture complexities and better manage ever-increasing layers of hardware complexity.
Co-design efforts at LLNL include the following
The Advanced Simulation and Computing (ASC) program develops and maintains engineering and physics integrated codes (EPICs) in support of stockpile stewardship. To meet the key needs of the EPICs, ASC has established the National Security Applications (NSApp) Co-Design Center. NSApp will focus on these established applications as the drivers, and participate in co-design and vendor interactions largely through proxy applications.
The PathForward initiative is intended to speed up and influence the development of technologies companies are pursuing for commercialization to ensure these products include the features that DOE and NNSA laboratories require for research. PathForward funds innovative new and/or accelerated R&D of technologies targeted for productization in the 5–10 year timeframe.