Combining specialized software tools with heterogeneous HPC hardware requires an intelligent workflow performance optimization strategy.
Topic: Resource and Workflow Management
LLNL is participating in the 34th annual Supercomputing Conference (SC22), which will be held both virtually and in Dallas on November 13–18, 2022.
The latest issue of Science & Technology Review highlights the R&D 100 award–winning Flux software framework.
This 2021 R&D 100 award-winning software solves data center bottlenecks by enabling resource types, schedulers, and framework services to be deployed as data centers evolve.
The Advanced Technology Development and Mitigation program within the Exascale Computing Project shows that the best way to support the mission is through open collaboration and a sustainable software infrastructure.
LLNL participates in the International Parallel and Distributed Processing Symposium (IPDPS) on May 30 through June 3.
LLNL participates in the ISC High Performance Conference (ISC22) on May 29 through June 2.
The Livermore Computing–developed Flux project addresses challenges posed by complex scientific research supercomputing workflows.
The renowned worldwide competition announced the winners of the 2021 R&D 100 Awards, among them LLNL's Flux workload management software framework in the Software/Services category.
The renowned worldwide competition announced the finalists for the 2021 R&D 100 Awards, among them LLNL's Flux workload management software framework in the Software/Services category.
This video describes Flux, an open-source software framework that manages and schedules computing workflows to maximize available resources to run applications faster and more efficiently.
LLNL, IBM, and Red Hat will develop best practices for interfacing HPC schedulers and cloud orchestrators in preparation for supercomputers that use cloud technologies.
CTO Bronis de Supinski discusses the integrated storage strategy of the future El Capitan exascale supercomputing system, which will have in excess of 2 exaflops of raw computing power spread across nodes.
A near node local storage innovation called Rabbit factored heavily into LLNL’s decision to select Cray’s proposal for its CORAL-2 machine, the lab’s first exascale-class supercomputer, El Capitan.
Highlights include response to the COVID-19 pandemic, high-order matrix-free algorithms, and managing memory spaces.
The Maestro Workflow Conductor is a lightweight, open-source Python tool that can launch multi-step software simulation workflows in a clear, concise, consistent, and repeatable manner.
Highlights include CASC director Jeff Hittinger's vision for the center as well as recent work with PruneJuice DataRaceBench, Caliper, and SUNDIALS.
Highlights include complex simulation codes, uncertainty quantification, discrete event simulation, and the Unify file system.
Highlights include recent LDRD projects, Livermore Tomography Tools, our work with the open-source software community, fault recovery, and CEED.
Cram lets you easily run many small MPI jobs within a single, large MPI job by splitting MPI_COMM_WORLD up into many small communicators to run each job in the cram file independently.
Caliper enables users to build customized performance measurement and analysis solutions by connecting independent context annotations, measurement services, and data processing services.