Warding Off Non-Deterministic Software Bugs at Scale

Non-deterministic software bugs are one of the most time-consuming and expensive problems to solve in software development. Tedious to find and often detected late in development, non-deterministic errors occur only occasionally and sporadically, even when run with the same input on the same hardware. These productivity killers are remarkably challenging to catch due, in large part, to being difficult to reproduce. Some errors do not reproduce even when being debugged, as the act of debugging may perturb the execution enough to mask the bug. Unfortunately, non-deterministic debugging of parallel applications running on large supercomputers, such as those at Lawrence Livermore, presents even greater challenges. This video introduces PRUNERS, a new software toolset to fight non-deterministic bugs on supercomputers.