ICC Home Privacy and Legal Notice LC User Documents Banner

UCRL-WEB-200040

CHAOS: Linux from Livermore


Problems Addressed

The four problems that the distinctive extra features of CHAOS address are:

  • A high-performance interconnect (internal network or "switch") for message passing among Linux-based compute nodes.
    SOLUTION: device drivers for Quadrics QsNet.
  • A scalable parallel file system with both hardware and software support for fast parallel I/O.
    SOLUTION: Lustre Lite (software) and Blue Arc servers (hardware).
    See the Lustre section of LC's I/O Guide for details.
  • A portable (vendor-independent) resource manager for batch jobs, optimized for LC's job-control needs and able to invoke any of several different job schedulers (FIFO, backfill, or others).
    SOLUTION: SLURM, deployed in fall, 2003.
  • Reliable and efficient administrative tools for very large clusters of Linux-based compute nodes.
    SOLUTION: a family of CHAOS management tools, some for staff and some for users.

The CHAOS operating system kernel is based on commercial Red Hat kernel releases instead of generic "stable Linux kernel releases" to provide a (more) reliable, predictable, focused way to both fix errors and report new-found problems. LC has modified the Red Hat kernel to support the special computing needs of LLNL users with (a)added or updated device drivers, (b)increased resource limits (for example, CHAOS allows 8192 instead of just 1024 file descriptors per process), (c)crash dump support to manage trouble during big production runs, (d)serial console logging, and (e)changes to enable parallel debugging tools like TotalView.

To determine which version of CHAOS an LC machine is currently running, execute

uname -a

and look in the third field (from the left) in the output line returned. If that field contains
  • [Red Hat] 2.4.21, then the machine uses CHAOS 2.0;
  • [Red Hat] 2.6.9, then the machine uses CHAOS 3.0.



Navigation Links: [ Document List ] [ HPC Home ] [ Next ]