Application-Level Resilience
Application-level resilience is emerging as an alternative to traditional fault tolerance approaches because it provides fault tolerance at a lower cost than traditional approaches.
AutomaDeD
This tool that automatically diagnoses performance and correctness faults in MPI applications. It identifies abnormal MPI tasks and code regions and finds the least-progressed task.
GREMLINs
These techniques emulate the behavior of anticipated future architectures on current machines to improve performance modeling and evaluation.
Sandia, Intel seek novel memory tech to support stockpile mission
ASC’s Advanced Memory Technology research projects are developing technologies that will impact future computer system architectures for complex modeling and simulation workloads.
Best paper winner improves scientific workflow performance
Combining specialized software tools with heterogeneous HPC hardware requires an intelligent workflow performance optimization strategy.
LLNL staff returns to Texas-sized Supercomputing Conference
The 2022 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22) returned to Dallas as a large contingent of LLNL staff participated in sessions, panels, paper presentations and workshops centered around HPC.