Artificial intelligence (AI) and machine learning (ML) techniques have become essential components in supporting the National Nuclear Security Administration’s (NNSA’s) mission of assessing and certifying the safety, security, and effectiveness of the nation’s stockpile without underground testing. A team of Livermore computer scientists and physicists is developing tailored ML tools from open-source ML packages to improve efficiency in the design process, support stockpile decision making informed by data-driven physics models, and advance the utilization of high performance computing (HPC) resources.
The Vidya project is a portfolio of research efforts that are directed at developing key capabilities to: advance research in physics-informed ML, improve employment of ML with sparse data, invest in validated and explainable ML, explore learning hardware systems in HPC systems, create an advanced ML-tailored data environment, improve simulation workflows, and build ML expertise across the complex. The Vidya project was formed in 2019 to unify several ML explorations funded under the NNSA’s Advanced Simulation and Computing (ASC) Program as part of the Advanced Machine Learning Initiative. By providing common tools and infrastructure, the project can focus on delivering solutions in addition to exploration.
A substantial part of Livermore’s Weapons Complex and Integration strategic planning focuses on a virtual design assistant (ViDA). The ViDA vision utilizes AI to streamline the design process through automation to increase efficiency, allowing a single subject matter expert to do more of the complete workflow. By 2030, the plan is to increase automation using AI and other ML tools on Livermore’s HPC systems.
“We have new technologies, and we want to understand how they can be used to help solve challenges we consistently have in ASC,” says Katie Lewis, who leads the Vidya project. Lewis explains that the quintessential dilemma for designers is that the workflows are often complicated, user intensive, and require a great deal of experience with specific tools. “We see machine learning being able to solve some specific problems better and faster than a human by going through that same trial and error process and getting smarter along the way in a machine learning sense. If more workflows were automated, it would make designers’ lives much easier and the end solution more objective.”
One common workflow challenge in simulation problems is mesh tangling. Many simulations use an underlying grid (or mesh) to describe the geometry of the problem. Mesh tangling occurs when the material moves in a way that causes the mesh to become inverted, making it difficult to determine the true state of the material’s composition, such as an area with high vorticity.
To address this issue, Arbitrary Lagrangian–Eulerian (ALE) simulation codes use relaxers that allow the mesh to move in a way that is not consistent with the movement caused by the physics. Tuning ALE settings is complicated and involves multiple simulation restarts. The Vidya team has developed an ML-driven ALE relaxer that builds on a previous Laboratory Directed Research and Development–funded effort. The machine-learned solution, developed primarily by Alister Maguire, uses convolutional neural networks to recognize correlation between the mesh space–time evolution and its resulting mesh tangling. As a result, the algorithm applies relaxation appropriately to avoid tangling without losing the physics fidelity that would occur from applying too much relaxation. This method helps automate simulations to prevent mesh tangling.
The ALE model runs through a set of trial-and-error cycles as a designer would. After each iteration of the physics simulation, the model improves and eventually learns when and where relaxation is appropriate to allow the simulation to complete successfully.
Another effort associated with the Vidya project is the creation of a Splash application that will be used to illustrate the hardware capabilities of Livermore’s future supercomputer, El Capitan, set to come online in 2023. The Splash application will showcase how the exascale machine will optimize inertial confinement fusion (ICF) designs. “We’re bringing together a lot of different work that has been done in machine learning across different areas to do design optimization for ICF,” says Lewis. “The end goal is to create an optimized design that will be used to perform an experiment on NIF.”
The amount of data required to inform ML techniques can be daunting, but standard LLNL tools—such as the Livermore Big Artificial Neural Network (LBANN) toolkit for parallel training, Conduit for in-situ data interfaces, and Sina and Kosh for data management—have enabled these solutions on HPC systems.
The team hopes to build strategic partnerships with industry, academia, and other institutions to develop its ML tools. Team members work closely with Livermore Computing, which manages the vendor collaborations focused on designing the new hardware required for ML, along with the Weapon Simulation and Computing workflow team to ensure the smooth integration with existing systems.