CHAOS: Linux from Livermore
CHAOS (Clustered High Availability Operating System) is Livermore
Computing's localized version of Linux.
CHAOS is maintained by local software developers to meet the special
needs of local users (and their system administrators).
The commercial Red Hat (brand) "boxed set" distribution of Linux
forms the core of CHAOS. LC staff members have:
some features to better support the scientific computing priorities
typical at Livermore, and different from most business-oriented
installations of Linux,
extra support for LC's very large clusters of nodes intended for
parallel computations, and
on just the hardware and software found in LC production systems,
to maximize the relevant return on local programming investment.
After experimenting with several approaches to Linux use on LC machines,
the staff of the Livermore Linux project decided that the best way
to provide a good production and program-development environment with
Linux was to focus on issues usually neglected when Linux is sold,
installed, or applied commercially.
Central to this "Livermore model" of computing is an emphasis on:
- High-performance computing (HPC) techniques,
including the use of large clusters, big data files, and very long-running
jobs often seen at LC.
- LC computational needs,
especially for systems that enable rich simulations of a narrow
set of numerical problems assisted by extensive and carefully tailored
- Local system administration style
(not superficial or turnkey but "deep," with strong, on-going feedback
between system administrators and system-software developers).
In many ways this approach to Linux establishes once again
a style of operating-system
design and management common at Livermore during the LTSS/CTSS era
a decade ago (when another locally developed system embodied commitment
to the same three dominant characteristics listed here).
CHAOS, with the local modifications and threefold emphasis described
here, evolved from several years of LC collaborations with current
or former vendors on experimental high-performance computing systems.
Frustration with the pace or outcome of several of those efforts led
LC to refocus on a more independent Linux path during 2001.
The first large-scale deployments of CHAOS appeared in 2002
(on what was formerly called the secure-network
Production Capacity Resource (clusters Adelie and Emperor),
then on the
Multiprogrammatic Capability Resource cluster MCR).
LC's primary administrative strategies to build and refine
- a few carefully chosen company partnerships,
rather than reliance on broad, amorphous, sometimes divisive Linux
collaborative work groups or committees, and
- open source sharing of nonproprietary new features
(under the DOE-approved GNU General Public License),
rather than reliance on the usual UC/LLNL highly constrained approach
to intellectual property management.
Additional technical information on CHAOS and related hardware
(Intel-chip Linux cluster) trends is collected and posted by project
developers on the OCF web site
as the CHAOS staff releases it to the computer-science community.
Details on the CHAOS resource manager (called SLURM)
appear in the separate
SLURM Reference Manual
(a section below
summarizes SLURM's innovative user-support features).
The "exec-shield" security feature
was added to CHAOS as version 3.0 gradually deployed on LC machines
in late 2005.
Navigation Links: [
Document List ] [
HPC Home ] [