High performance computing (HPC) hardware is fundamentally shifting—moving away from homogeneous distributed memory platforms to heterogeneous graphics-processing-unit (GPU) and many-core systems. Many mission-critical applications used across the National Nuclear Security Administration (NNSA) will require substantial modification to make effective use of these new architectures. It is no longer practical for each code team to independently stay abreast of the latest developments in architecture, programming models, and kernel optimization needed to run efficiently on each new generation of hardware.
Instead, the Advanced Simulation Computing (ASC) program at LLNL formed the Advanced Architecture and Portability Specialists (AAPS) team to help LLNL code teams identify and implement optimal porting strategies, and to provide general guidance on best practices for exposing parallelism and managing data movement across memory layers.
Led by LLNL computational scientist Erik Draeger, the AAPS team is staffed with a mix of computational and computer scientists with a proven track record of success in scaling scientific applications on new, cutting-edge hardware. Members of the team have been involved with nearly every Gordon Bell Award submission from LLNL in the past decade, having collectively worked on seven projects selected as Gordon Bell finalists since 2005, three of which were award winners. Team members also possess deep expertise in key areas such as general-purpose GPU (GPGPU) and many-core programming, heterogeneous programming models, and kernel optimization.
The team’s initial engagement strategy is to work closely with a handful of codes at a time and provide hands-on support, with engagements tailored so that the skills and experience provided by the AAPS team meet the unique needs of each application. “We are lucky to have some exceptionally talented people on the team,” said Draeger, “but the complexity of some of the Lab’s big multiphysics codes cannot be overstated. We have to rely heavily on the developers of those codes to help us understand their performance characteristics and identify optimal porting strategies. Effective collaboration is crucial.”
In addition to providing hands-on assistance to help code teams modify their applications to efficiently use new architectures, the AAPS team will work to share successful strategies to common challenges between codes. “Right now, we have a lot of very talented people working in parallel to solve some of the same problems,” said Draeger. “Because of the differences between codes, there isn’t likely to be a single one-size-fits-all solution for all of them; but the hope is that by sharing the innovations of their colleagues, we can accelerate progress for everyone.” To facilitate this, the expertise gained from interactions with LLNL code teams will be collected into a knowledge repository and made available to the broader community, thereby allowing all applications to benefit from the best practices and lessons learned with each new engagement.
One of the challenges facing many applications is performance portability, or the ability of codes to run efficiently on different architectures with minimal modification. This is especially critical for the large ASC multiphysics packages—with millions of lines of code, extensive modification to make use of hardware-specific features is simply not practical. Instead, new programming models are being explored to control how computation is carried out on a given architecture at a high level, thereby allowing codes to achieve optimized performance on different platforms without extensive modification.
One of the more promising new models is RAJA, currently being developed by LLNL computer scientists Rich Hornung and Jeff Keasler. The AAPS team is working closely with application developers to assess whether the RAJA model is a viable strategy and to identify some initial best practices for effectively using the model. By broadly disseminating these initial findings, the team will provide developers with valuable information needed to develop an optimal porting strategy.
Another challenge facing applications is the need to manage data movement in heterogeneous architectures while still maintaining performance portability. AAPS team members Holger Jones and David Poliakoff are working closely with the ALE3D team to develop a new resource manager that uses smart pointers to automatically move, track, and allocate data as needed within an application at run time, thereby making it significantly easier for application developers to write efficient, general-purpose code that will reliably run on a variety of machines.
As supercomputing hardware continues to increase in complexity, the performance gulf between modern and legacy codes will continue to grow. Collaboration and sharing of expertise are becoming increasingly important aspects of HPC code development. An agile team of application-savvy computer and computational scientists, such as the AAPS team, can have a significant impact by helping to drive interactions between code teams and the broader research community, helping to ensure mission-critical applications are able to remain at the forefront of capability-scale computation.