Tidy data sets, in which each measurement is a column and each observation a row, are relatively straightforward to model and visualize. High-dimensional data sets, which contain a large number of measurements per observation, or few measurements but many observations, are much more challenging to work with but quite common in science and engineering research. While various analysis and visualization tools exist to assist researchers in understanding high-dimensional data, they tend to be either too simplistic or too opaque.
Livermore computer scientist Peer-Timo Bremer observes, “What researchers really need are automated analysis methods that are advanced but easy to understand and visualization methods that deliver interactive and unbiased visualizations of the output.” Bremer and colleagues have established an alternative approach based on the shape of the data. This advanced, intuitive method for analyzing and visualizing complex data sets can be adapted and extended to a wide variety of applications.
Advanced by Livermore and Utah researchers, this solution is based on Morse theory, which was developed decades ago but adapted only recently to modern computing requirements. The method involves extracting key points from a data set—minima, maxima, and saddle points—and creating a graph that connects these points to their neighbors through surface paths. “The result,” says Bremer, “is a Morse-Smale structure, a very sparse and abstract but complete description of the problem. We know where the ridges and valleys are and what the general shape of things is, and we can use this information to explore various data relationships.”
“Other techniques reduce the data first and then analyze it, but this means that you lose information before the analysis happens,” he continues. “We do the opposite—we analyze and then visualize it as a lower dimensional structure. This way, we can get significantly more insights into different data ‘neighborhoods’. We’re not just looking at the highest mountain peak, but many peaks, each of which may have its own features that behave differently.”
This advantage was readily apparent when Bremer and colleagues applied topological methods to computer simulations of laser experiments at LLNL’s National Ignition Facility. The goal was a better understanding of how to improve neutron yield, which is essential to achieving a fusion reaction rate sufficient to generate fusion ignition. Topological analysis of five factors thought to influence yield and 13 adjustable experimental parameters within a 1000-member engineering simulation ensemble revealed that rather than the expected single solution, there were two possible but very different ;paths.
The team has also successfully incorporated topological techniques into an assessment of the reliability of modern climate models, funded by the Laboratory Directed Research and Development Program. The project involved analysis of 35 million CPU hours of climate simulation that varied as many as 37 parameters. Topological methods revealed some interesting connections that several other standard analysis approaches missed. For instance, at a regional level, they found two distinct modes that seem to be responsible for cloud formation.
“The Morse approach lets us visualize the data on multiple scales. We’re finding that sometimes we need to do a local, not global, analysis to see the importance of these values,” says Bremer.
Bremer and colleagues are continuing to explore both fundamental and applied aspects of this project, with promising results. Recent projects in conjunction with other national laboratories, for instance, have involved nuclear reactor accident simulations and combustion optimization, while Utah collaborators are particularly focused on building web-based visualization tools that will allow more researchers to learn about and use these techniques.
“These concepts are very new,” Bremer observes. “Not only are they complex to explain, results can be challenging to interpret. So we’re trying to figure out what the user community needs and to support it, and also to get the word out about the tool.