Commodity technology systems (CTS)—smaller and less expensive commercial-grade systems to run parallel problems with more modest computational requirements—allow the National Nuclear Security Administration’s (NNSA’s) more powerful supercomputers, or advanced technology systems (ATS), to be dedicated to the larger, more complex calculations critical to stockpile stewardship. Reducing the total cost of ownership for robust and scalable HPC clusters is a significant benefit to LLNL and its programs.
TOSS: The right operating system to run commodity computing hardware
The Tri-Lab Operating System Stack (TOSS) was created to run on commodity systems. The goal of the TOSS project has been to increase efficiencies in the ASC Tri-Lab community with respect to both the utility and the cost of a common computing environment. The project delivers a fully functional cluster operating system, based on Red Hat Linux, capable of running MPI jobs at scale on hardware across the Tri-Lab complex.
TOSS provides a complete product with full lifecycle support. Well-defined processes for release management, packaging, quality assurance testing, configuration management, and bug tracking are used to ensure a production-quality software environment can be deployed across the Tri-Labs in a consistent and manageable fashion.
TOSS has enabled Linux-based computing on four generations of CTS clusters, from 2007 to the present.
Center-wide TOSS deployment improves user and staff experience
With the procurement of the El Capitan systems, Livermore Computing made the unprecedented decision to use TOSS as the operating system for its largest and most performant machines, including the 2.8-exaflops El Capitan. Now, TOSS is installed across the entire computer center, which is more consistent for users and easier to maintain for system administrators.
For more information on the details of TOSS and whether it might be right for your center, see our HPC.llnl.gov TOSS documentation.
For a higher-level view of TOSS, its history, and its impact, see Supercomputing in Sync from LLNL's Science & Technology Review.
To reference TOSS, please cite the following paper: Edgar A. León, Trent D’Hooge, Nathan Hanford, Ian Karlin, Ramesh Pankajakshan, Jim Foraker, Chris Chambreau, and Matthew L. Leininger. TOSS-2020: A Commodity Software Stack for HPC. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC’20. IEEE Computer Society, November 2020.
