Computation is an ever-growing, ever-advancing field. Systems developed in the next decade will be as much as 1,000 times faster than the petascale systems of the past decade. System architectures are also growing more diverse and difficult to predict, so the strategies used for writing code and running simulations on these machines need to diversify as well. Leveraging increased processing power from future computers will require the flexibility to run applications on diverse hardware architectures composed of graphics processing units (GPUs), multi-core central processing units (CPUs), or a combination of both.
This is the imperative being addressed by Livermore computational researchers Cyrus Harrison and Ming Jiang and their collaborators at Lawrence Berkeley National Laboratory and the University of Texas at Austin. They have created a flexible Python- and OpenCL-based framework that runs on many-core architectures by using adaptable building blocks. This flexibility lends support to the creation of various execution strategies for a key analytical component of many simulations to ensure the codes can survive and thrive in the ever-evolving high performance computing (HPC) landscape.
The effort began in 2011, when the team began to rethink the most efficient way to implement derived fields. Derived field generation, a critical aspect of many visualization and analysis systems, refers to the creation of new fields from existing fields in simulation data.
Harrison explains, “Basically, users of simulation codes only store so many things when they run the codes. But when they want to analyze their results, they will invariability want to calculate additional things for their analysis. So a ‘derived field generation’ framework allows them to transform what they have into the new values, or fields, they need for their analysis.”
Frequently, frameworks implement this capability by providing users with a language to create new fields and then translating their “programs” into a pipeline of filters that are combined in sequential fashion. This approach, while practical, is cache inefficient, as it entails iterating over large arrays many times to dynamically generate new fields. It also lacks the flexibility to exploit emerging parallel architectures.
The team is working to establish an improved framework for designing and testing derived field generation execution strategies. While the researchers’ main goal was to create an extremely flexible framework, they were also mindful of data motion costs, both in energy usage and time to solution, for different methods. Many other efforts are underway, both at LLNL and other institutions, to employ on-node parallelism for visualization and analysis across architectures but this team’s work differs due to its focus on derived field generation.
To evaluate their framework, the team exploited Livermore’s Edge GPU cluster, a 216-node Linux cluster architecture. Edge was integral to the study because it supported both CPU and GPU programming schemes and was thus able to provide a comprehensive evaluation. VisIt, an interactive parallel visualization and graphical analysis tool for viewing scientific simulation data, served as the host application for the framework.
The team conducted three studies to complete the evaluation—runtime performance, memory usage, and framework integration—and designed and tested three execution strategies—round trip, staged, and fusion—to evaluate their runtime performance and memory constraints.
The team found that while GPUs certainly decreased the computation’s run time, not all test cases were completed using the GPU. Conversely, all test cases were completed using the CPU, but with a longer runtime than that of the GPU. This evaluation confirmed the usefulness and necessity of a flexible framework to aid computer scientists in their programming decisions.
“Writing software in a way that yields portable performance is a big challenge,” notes Harrison. “Our research shows that supporting multiple, automatic ways to dynamically compose operations is a promising approach to programming diverse hardware in a portable way.”