ICC Home Privacy and Legal Notice LC User Documents Banner

UCRL-WEB-200040

CHAOS: Linux from Livermore


High-Performance Interconnect

Local support for a high-performance interconnect (internal network) among CHAOS cluster nodes developed in several stages.

QUADRICS/ELAN SUPPORT:
First, the Linux Project ported the Quadrics QsNet device drivers and related software from Compaq Alpha chips running Compaq's proprietary Tru64 version of UNIX to the same chips but running Red Hat Linux. As a result, QsNet under Linux not only executed reliably, but it slightly outperformed the original Tru64 version (with a maximum bandwidth of 210Mbyte/s).

Second, the Linux Project staff shifted focus to Intel chips and locally modified Red Hat Linux for QsNet support. Quadrics, for their part, released most of their software under an open source license and concentrated their business on Linux platforms. Meanwhile, the LC collaborators modified the system kernel used locally (now called CHAOS) to once again support QsNet in three ways:

  • LC added (improved) device drivers for Quadrics Elan3 and Elan4.
  • LC included the Quadrics software environment to run parallel jobs across a cluster (such as libelan, a low-level library of message-passing functions).
  • LC packaged with CHAOS the Quadrics MPING ping-pong test, as part of a basic MPI test suite.

The QsNet interconnect is now available on some LC CHAOS-based Linux clusters (such as Thunder, Lilac, and ALC). Besides its direct benefits, it enables other, higher-level system features, such as a scalable parallel file system (next section).

The Elan Communication Library (libelan, mentioned above) helps optimize MPI behavior on Linux clusters with the Quadrics switch. Twenty-one environment variables (most begin with the characteristic string LIBELAN_) allow you to manage the impact of this library to:

  • work around application hangs caused by communication problems,
  • improve code performance under Linux (CHAOS),
  • handle large amounts of message-passing memory, and
  • enable Elan library support for MPI debugging.
For a current list of these environment variables and their specific roles, see this (open-network only) web site:

http://www.llnl.gov/computing/mpi/elan.html


INFINIBAND (OPENIB) SUPPORT:
In 2003 an ASC PathForward project began to promote commercial support for a much faster interconnect called Infiniband. The national laboratories worked with industrial partners and open-source software efforts in a collaboration (partly ASC funded) called OpenIB (see www.openib.org for background). By 2005 the first high-performance computing (HPC) release of OpenIB became available. By 2006, LC began installing clusters (the Peleton procurement, involving machines such as Atlas and Zeus (OCF), and Rhea and Minos (SCF)) that featured an Infiniband internal network.

CHAOS evloved to support this switch innovation. By May, 2007, CHAOS version 3.2 was developed specifically for such clusters and was deployed (exclusively) on them, with OpenIB support included.


Navigation Links: [ Document List ] [ HPC Home ] [ Previous ] [ Next ]