Formal Methods and Correctness

Scientific computing is becoming increasingly complex, both in terms of the difficulty of the problems to solve and the hardware used to solve them. State-of the-art systems have heterogeneous architectures, combining different types of processing units, such as GPUs and AI-based chips, to accelerate operations and reduce time to solution. With every new generation of machine, and especially the emergence of AI, software developers must adapt and port critical applications to optimize performance gains and achieve desired results. However, such modifications can result in errors that can adversely affect simulation outcomes.

Livermore researchers are engaged in efforts to apply correctness—verifying that a system behaves according to its mathematical specification—and formal methods (tools for proving correctness) to improve the reliability, reproducibility, and accuracy of the Laboratory’s high-performance computing (HPC) codes. In 2025, two Livermore projects were awarded funding through the Correctness for Scientific Computing (CS2) initiative, a partnership between the Department of Energy’s (DOE’s) Office of Advanced Scientific Computing Research (ASCR) and the National Science Foundation, to pursue research avenues in this space.

Although formal methods have been successfully implemented outside the Laboratory for proving correctness in non-scientific computing applications, the scale; parallelism; and complexity of scientific computing (SC), including high-precision, floating-point arithmetic, has limited their adoption at Livermore. Computer scientist Ignacio Laguna, who has spent more than 10 years developing customized, practical tools for identifying errors in the Laboratory’s SC codes, leads a CS2 project to build a formal method for proving correctness in linear solver libraries, including hypre and PETSc. He says, “Traditional testing methods for detecting errors do not cover all possible inputs. Formal methods ensure the applications produce consistent numerical results based on the mathematical specifications.”

Over the CS2 project’s four-year horizon, Laguna and colleagues at Livermore, the University of Utah (Professor Ganesh Gopalakrishnan), and the University of Rochester (Professor Sreepathi Pai) will develop methods for predicting when errors can occur in SC linear solvers by first identifying places where calculations are most at-risk for generating numerical exceptions. The work will involve incorporating assertions to determine whether specific conditions of a program are met during execution, or whether mathematical assumptions are violated. Additionally, the team will build models that can simulate AI-based hardware calculations—which use lower precision, non-standard formats—to determine when exceptions, such as division-by-zero or overflow errors, may occur when porting the Laboratory’s codes to AI-based systems.

The formal verification tools will first be tested on individual software packages then implemented for full-scale software systems. Laguna says, “Ideally, these tools will allow us to predict the occurrence of an exception before the code is ever run on a machine, so we can make modifications in advance to reduce downtime and improve results.” Laguna is also serving as a subject matter expert for the second CS2 project, led by research scientist Mohit Tekriwal, which focuses on developing formal methods for use with compilers, using LLVM as the test case. Tekriwal says, “Specifically, we aim to rigorously prove properties like numerical stability and floating-point correctness in key scientific libraries.”

The goal of the project is to build a mechanized, end-to-end verification framework for automating the error-checking process in SC codes regardless of which programming language was used to develop them. Traditionally, programmers need to manually rewrite codes written in different programming languages into code that a theorem prover can execute, but that takes specific expertise and is intractable for large-scale SC applications. “Instead of doing the verification directly on the source language, we are fixing a verification target to LLVM, which is source-agnostic, and that opens us up to using some other tools in the formal methods community, which are based on LLVM, such as Vellvm to mechanize the LLVM semantics.”

They will use the MLIR compiler technology to translate different programming languages into a common format and then convert that code into a mathematical model where correctness can be proven with high confidence. The theorem prover will then be extended to handle mixed-precision error propagation. “Essentially, the whole end-to-end process can be broken down into a proof of mathematical correctness that can prove the statement of convergence and stability hold for finite precision calculations, and a refinement proof that connects the concrete implementation to mathematical correctness,” says Tekriwal. In the final stages of the project, the team will apply the formal methods to the hypre and GenDiL libraries, focusing on matrix-free finite element algorithms, specifically those used for solving problems related to inertial confinement fusion and plasma physics.

“In the HPC community, we strive for performance, but in the process of achieving high performance on new systems, there are trade-offs, including lower precision arithmetic. The goal of CS2 projects is to bring correctness to the same level as performance to raise confidence in the future reliability and accuracy of our codes when porting to mixed-precision systems,” says Tekriwal. As AI becomes more prevalent and integrated into future systems, scientific software must undergo its own transformation to ensure critical applications continue providing accurate results. Applying proven formal methods to the Laboratory’s SC codes will ensure correctness no matter what generation of software or hardware comes next.