Alkemi: Improving Simulation Workflows with Machine Learning and Big Data
al·ke·mi │ \ˈal-kə-mē\ noun: a seemingly magical process of transformation, creation, or combination.
Simulation workflows for Arbitrary Lagrangian–Eulerian (ALE) methods are highly complex and often require a manual tuning process. Developing ALE workflows is often a trial-and-error process that can be disruptive and time consuming—a few hours of simulation can require many days of manual tuning. There is an urgent need to semi-automate this process to reduce user burden and improve productivity. To address this need, we are developing novel predictive analytics for simulations and an in situ infrastructure for integration of analytics. Our ongoing goals are to predict simulation failures ahead of time and proactively avoid them as much as possible.
Predictive analytics for simulations:
We are developing machine learning algorithms to predict conditions leading to simulation failures. We also investigate supervised learning to develop classifiers that can predict failures by using the simulation state as learning features.
In situ infrastructure for integration:
We are developing an in situ infrastructure to integrate the predictive analytics with the running simulation in order to adjust the workflow dynamically. This infrastructure includes components for data collection and user interface.
The above figure provides a high-level overview of an infrastructure for integrating machine learning into high performance computing simulations. At the top level is the user interface, where Workflow Management interacts with the Simulation Run and the Visual Analytics interacts with the Machine Learning. At the middle level is the data collection, where the key component is the Feature Aggregator, which aggregates massive simulation data into learning features for training data. At the bottom level is the predictive analytics, where the Machine Learning generates Statistical Models that are then used for Inference Algorithm.
- M. Jiang, B. Gallagher, J. Kallman, and D. Laney, “A Supervised Learning Framework for Arbitrary Lagrangian-Eulerian Simulations”, IEEE International Conference on Machine Learning and Applications, 2016.
- R. da Silva, R. Filgueira, I. Pietri, M. Jiang, R. Sakellariou, and E. Deelman, “A Characterization of Workflow Management Systems for Extreme-Scale Applications”, Elsevier Future Generation Computer Systems, Volume 75, 2017.