StarSapphire is a collection of projects in the area of scientific data mining focusing on the analysis of data from scientific simulations, observations, and experiments. StarSapphire is a follow-on to the Sapphire scientific data mining project, where we conducted research in algorithms, incorporated this research into software, and applied the software to real-world problems, which, in turn, motivated our research. Our experiences showed that we could use techniques from data mining, machine learning, image and video processing, statistics, and pattern recognition to improve the way in which scientists extract useful information from data.

In the StarSapphire projects, we are leveraging these earlier experiences to address the recent challenges in data-driven modeling and analysis. These challenges are the result of newer types of data, such as data streams, larger volumes of data, such as those resulting from three-dimensional simulations of complex phenomena, and new constraints on the analysis, such as the need for in-situ analysis in exascale systems or real-time analysis for anomaly detection.

Despite these new challenges, our approach to analysis remains the same as the one we developed and used in Sapphire (shown in Figure 1). This approach worked very well in the analysis of data from a variety of problems in many different domains. We found that it was important to consider scientific data mining as an iterative and interactive process, involving data pre-processing, search for patterns, knowledge evaluation, and possible refinement of the process based on input from domain experts or feedback from one of the steps.

As the pre-processing of the data is a time-consuming, but critical, first step in data mining, we include it as an integral part of the process. The pre-processing is often domain and problem dependent; however, several techniques developed in the context of one problem or domain can be applied to other problems and domains as well. The pattern recognition step is usually independent of the domain or problem.

star sapphire analysis and approach diagram
Figure 1. Our end-to-end approach to data analysis.

Projects

As part of StarSapphire, we are involved in the following projects:

  • Poincaré Plots: Classification and characterization of orbits
  • Blob Tracking: Analysis of coherent structures in NSTX images
  • GSEP Analysis: Analysis of fluid and particle data from GSEP simulations
  • WindSENSE: Managing the integration of wind energy on the power grid
  • SensorStreams: Real-time analysis of streaming data from sensors
  • MINDES: Data mining for inverse design
  • Exa-DM: Enabling scientific discovery in exascale simulations
  • IDEALS: Improving data exploration and analysis at large scale
  • Additive Manufacturing: Investigating data mining for additively manufactured parts

Team

Former Members

  • Ya Ju Fan (postdoc and staff)
  • Mandoye Ndoye (postdoc)

Students and Faculty

  • Richelle Coppin, UC Davis, summer 2020
  • Ravi Ponmalai, UC Irvine, summer 2019
  • Juliette Franzman, UC Berkeley, summer 2019
  • Renee Swischuk, Texas A&M, summer 2017
  • Jeremy Thompson, Faculty, US Air Force Academy, summer 2014
  • Aaron Sisto, CSGF Fellow, Stanford, summer 2013
  • Prof George Karypis, University of Minnesota (co-PI on ExaDM, 2011–2014)
  • Jeremy Iverson, U Minnesota, summer 2011 and 2012
  • Jeremy Kun, U Illinois, Chicago, summer 2012
  • Xisen Tian, US Naval Academy, summer 2012
  • Seth Kirk, U Kentucky, summer 2011

Why StarSapphire?

A star sapphire is a type of sapphire that exhibits a star-like phenomena called asterism due to the presence of titanium dioxide impurities. The star effect results when light reflects from the needle-like inclusions of the impurities aligned perpendicular to the rays of the star. There were several options to name the follow-on project to Sapphire. A programming viewpoint might have called it "Sapphire++", a statistical viewpoint would have resulted in "Sapphire-PLUS", and an image understanding approach would have led to "Sapphire Junior". However, given the data mining focus, it seemed appropriate to name the collection of projects after another gemstone—hence, the choice of StarSapphire.