The number of processors within a typical supercomputer is expanding rapidly. Application performance gains on supercomputers, now and for the foreseeable future, stem in large part from code modifications that enable a system to complete more computational tasks in parallel, thus making more efficient use of the multitude of processors.
For the Livermore computational experts working to optimize several existing scientific simulation codes, a crucial step is ensuring that the codes are employing a scalable solution for time-dependent deterministic neutron transport equations that have a six-dimensional phase space. LLNL researchers are testing and enhancing a neutral particle transport code and the algorithm on which the code relies to ensure that they successfully scale to petaflop/s and exaflop/s computing systems.
When solving time-dependent deterministic neutron transport equations, LLNL scientific codes currently rely on an efficient iterative solving method called sweeping. Because of the six-dimensional phase space, reasonably resolved simulations require high unknown counts. One significant scaling advantage of sweep algorithms is that the number of iterations needed to solve the system is set by the physics of the actual problem and thus does not increase as the number of unknowns in the problem grows.
However, since sweeps have some inherently sequential steps, many computer scientists have doubted whether they would ever successfully scale to machines with hundreds of thousands or even millions of processors.
As part of a Laboratory Directed Research and Development Program project led by Robert Falgout, LLNL researchers identified the key components for controlling sweep algorithm efficiency and developed parallel performance models for optimized versions of the algorithms. Contrary to prevailing opinion, these models predicted that the sweeps would perform well on petascale systems.
To verify Falgout’s modeling results, Peter Brown, Teresa Bailey, Adam Kunen, and Britton Chang implemented and tested the efficient sweep algorithms in Ardra. Ardra is a 3D neutron and gamma particle transport code incorporating two types of sweeps. Ardra itself has undergone significant enhancements over the past year and a half to improve its scalability, including refactoring into C++, but sweep performance is still a crucial variable. Once the refreshed sweep algorithms were added into Ardra, the team tested the code on all 1.5 million processors of the Sequoia supercomputer. The problem included 37.5 trillion unknowns and represented the largest discrete ordinates transport calculation to date.
Ardra achieved over 71% parallel efficiency on Sequoia, demonstrating excellent scaling and validating model predictions. While the Ardra researchers are still contemplating further code and algorithm enhancements, the Sequoia results provide a strong indication that sweep algorithms will scale to exascale-sized problems and systems.