ICC Home Privacy and Legal Notice LC User Documents Banner

UCRL-WEB-200040

CHAOS: Linux from Livermore


CHAOS Goals

CHAOS (Clustered High Availability Operating System) is Livermore Computing's localized version of Linux. CHAOS is maintained by local software developers to meet the special needs of local users (and their system administrators).

WHAT.
The commercial Red Hat (brand) "boxed set" distribution of Linux forms the core of CHAOS. LC staff members have:

  • modified some features to better support the scientific computing priorities typical at Livermore, and different from most business-oriented installations of Linux,
  • added extra support for LC's very large clusters of nodes intended for parallel computations, and
  • focused on just the hardware and software found in LC production systems, to maximize the relevant return on local programming investment.

WHY.
After experimenting with several approaches to Linux use on LC machines, the staff of the Livermore Linux project decided that the best way to provide a good production and program-development environment with Linux was to focus on issues usually neglected when Linux is sold, installed, or applied commercially. Central to this "Livermore model" of computing is an emphasis on:

  • High-performance computing (HPC) techniques, including the use of large clusters, big data files, and very long-running jobs often seen at LC.
  • LC computational needs, especially for systems that enable rich simulations of a narrow set of numerical problems assisted by extensive and carefully tailored technical support.
  • Local system administration style (not superficial or turnkey but "deep," with strong, on-going feedback between system administrators and system-software developers).

In many ways this approach to Linux establishes once again a style of operating-system design and management common at Livermore during the LTSS/CTSS era a decade ago (when another locally developed system embodied commitment to the same three dominant characteristics listed here).

HOW.
CHAOS, with the local modifications and threefold emphasis described here, evolved from several years of LC collaborations with current or former vendors on experimental high-performance computing systems. Frustration with the pace or outcome of several of those efforts led LC to refocus on a more independent Linux path during 2001. The first large-scale deployments of CHAOS appeared in 2002 (on what was formerly called the secure-network Production Capacity Resource (clusters Adelie and Emperor), then on the 1152-node, open-network Multiprogrammatic Capability Resource cluster MCR).

LC's primary administrative strategies to build and refine CHAOS involve:

  • a few carefully chosen company partnerships, rather than reliance on broad, amorphous, sometimes divisive Linux collaborative work groups or committees, and
  • open source sharing of nonproprietary new features (under the DOE-approved GNU General Public License), rather than reliance on the usual UC/LLNL highly constrained approach to intellectual property management.

TECHNICAL BACKGROUND.
Additional technical information on CHAOS and related hardware (Intel-chip Linux cluster) trends is collected and posted by project developers on the OCF web site

     
http://www.llnl.gov/linux


as the CHAOS staff releases it to the computer-science community. Details on the CHAOS resource manager (called SLURM) appear in the separate SLURM Reference Manual (a section below summarizes SLURM's innovative user-support features). The "exec-shield" security feature was added to CHAOS as version 3.0 gradually deployed on LC machines in late 2005.


Navigation Links: [ Document List ] [ HPC Home ] [ Previous ] [ Next ]