Large Linux data centers require flexible system management. At Livermore Computing (LC), we are committed to supporting our Linux ecosystem at the high end of commodity computing. Administrators of Linux clusters will find an array of robust tools developed at LLNL for platform management, authentication, and I/O analysis. We have also made available an overview of our commodity cluster machine catalog, information for users about our software, and LLNL's software portal. Selected open-source tools are described below.
The health of system hardware—including temperature, voltage, fans, power supply, bus errors, and system physical security—requires constant monitoring of recovery capabilities, issue logging, and inventory information. Intelligent Platform Management Interface (IPMI) software facilitates this platform management by operating independently and remotely. FreeIPMI provides in-band and out-of-band IPMI software based on the IPMI v1.5/2.0 specification, which defines a set of interfaces for platform management and is implemented by a number of vendors. As IPMI standards evolve for monitoring autonomous computer platforms, multiple types of IPMI drivers have been developed for Linux systems.
Munge is an authentication service for creating and validating credentials. It is designed to be highly scalable for use in a high performance computing cluster environment. It allows a process to authenticate the UID and GID of another local or remote process within a group of hosts having common users and groups. These hosts form a security realm that is defined by a shared cryptographic key. Clients within this security realm can create and validate credentials without the use of root privileges, reserved ports, or platform-specific methods.