The Sapphire project is developing scalable algorithms for the interactive exploration of large, complex, multi-dimensional scientific data. We are applying and extending ideas from data mining, image and video processing, statistics, and pattern recognition in order to improve the way in which scientists extract useful information from data. Our work is done in the context of data analysis problems which arise in data from observations, simulations, and experiments. We focus on research in algorithms, incorporation of this research into software, and the application of the software to real-world problems at LLNL and elsewhere. The needs of these applications, in turn, drives our research.

Research Areas

To address the challenges that arise when data analysis techniques are applied to massive and complex data sets, we are focusing on the following research areas:

  • Image processing techniques for denoising, object identification, and feature extraction
  • Dimension reduction techniques to handle multi-dimensional data
  • Scalable algorithms for classification and clustering
  • Parallel implementations for interactive exploration of data
  • Applied statistics to ensure that the conclusions drawn from the data are statistically sound

Applications

  • Detection and tracking of moving objects in video
  • Comparison of simulations to experiments: Richtmyer-Meshkov instability
  • Analysis of high-fidelity fluid-mix simulations
  • Analysis of puncture plots
  • Blob tracking in plasma
  • Classification of bent-double galaxies
  • Separation of climate signals
  • Similarity-based object retrieval
  • Detection of human settlements in satellite imagery
  • Dimensionality reduction for plasma physics

Algorithms

  • Background subtraction for detection of moving objects
  • Using salient regions for tracking
  • Denoising using wavelets and filters
  • Nonlinear diffusion techniques for denoising
  • Level sets for image segmentation
  • Statistical shape analysis
  • Texture features for information retrieval
  • Statistical techniques for dimension reduction
  • Feature subset selection
  • Analysis of streaming data
  • Evolutionary algorithms and oblique decision trees
  • Evolutionary algorithms and neural networks
  • Creating ensembles of decision trees