Centers of Excellence Meeting Brings Together Diverse Set of Stakeholders
Technical experts from across the nation met April 19–21 in Glendale, AZ, to share ideas, progress, and challenges toward the goal of performance portability across the Department of Energy’s (DOE’s) large upcoming advanced architecture supercomputer procurements. The need for applications to run effectively on multiple vendor advanced architecture solutions (as well as on standard “cluster” technology) is pervasive across application teams within DOE and is a specified goal of DOE’s exascale plans for risk mitigation.
“The interest and attendance at this meeting highlighted the desire in the high performance computing [HPC] community to work towards common interests in the pursuit of exascale computing,” said Rob Neely, steering committee chair. “Each Center of Excellence [COE] is a lab–vendor partnership aimed at preparing important applications for deployment on the chosen platform at each of the five major DOE HPC facilities. Yet most of these application teams must run and support users on multiple systems either between the National Nuclear Security Administration [NNSA] labs or between the Leadership Computing Facilities of the Office of Science. This meeting was designed to take the excellent work being done in these COEs and bring the discussion up a level, to learn how we, as a community, can achieve the best of both worlds—performance and portability.”
Along with scientists from NNSA’s Advanced Simulation and Computing (ASC) Program (Lawrence Livermore, Los Alamos, and Sandia national laboratories), attendees included technical staff from other labs who are working on preparing their codes, vendors chosen to provide the next-generation platforms, and solution-providers (DOE or third party) who are developing software tools aimed at helping application teams approach the challenges of performance portability.
“The introduction of GPU [graphics processing unit] accelerators into the production ASC platform environment at LLNL, starting with the delivery of Sierra in 2017, will be disruptive to our applications,” admits Neely. “So LLNL chose the GPU accelerator path only after feeling convinced that first, performance-portable solutions would be available in that timeframe, and second, that the use of GPU accelerators was also likely to be prominent in the realization of exascale systems.” Neely added that by building on what LLNL has learned from extreme scaling on Blue Gene/Q and adding the ability to effectively use GPUs, LLNL applications will be in a strong position to adapt to whatever exascale brings.
The performance portability meeting built upon past individual COE meetings or workshops in that it provided a forum for best practices and ideas to be shared and focused on the issue of achieving high performance on these emerging platforms without greatly sacrificing portability and maintainability of applications.
Recognizing the immense challenges of porting and optimizing large applications to the advanced architecture systems planned for deployment at various national labs between 2016 and 2019, the Department of Energy established a Center of Excellence (COE) at each laboratory siting one of these systems. These COEs provide direct vendor expertise to the application teams and in turn, give the vendors deeper insight into how applications are run on those systems. Each of the five current COEs has a mission to optimize a set of applications for their specific platform. Making use of open standards, libraries, and software abstractions that allow for minimal code disruption without negatively impacting performance potential is the preferred path to programming, but it constitutes a large, as-yet-unsolved challenge.