High Performance Computing: Open Source and Released Software

Open Source and Released Software

The table below briefly describes software development efforts that Livermore Computing (LC) is sponsoring, developing, or in collaboration with. The project names are links to additional information and/or software available for download. Additional software applications from the CASC organization are listed on the CASC Software Distribution Web page.

Software Description
Blockbuster Blockbuster is an image/movie viewer/player especially designed for very large images and movies (high resolution and many frames) on cluster displays running the DMX X server.

Clustered High Availability Operating System is the operating system software environment used on LC Linux clusters. LC clusters, which provide the Livermore Model programming environment to Laboratory scientists, range in size from small, single-rack capacity systems to large capability systems.

Additional software developed for CHAOS includes, for example, tools for cluster configuration management (Genders), power management (PowerMan), console management (ConMan), and credential validation (MUNGE). See the Linux Software Downloads Web page for more information.

Chromium Chromium is a system for interactive rendering on clusters of graphics workstations. It is useful as both a research tool for dynamically changing the rendering of otherwise unmodified (and potentially closed-source) 3D apps, and as a production environment for distributing the 3D openGL rendering among multiple hosts and GPUs.
DMX Distributed Multi-headed X (DMX) provides a distributed X11 server implementation that aggregates a number of X11 servers into a single large X11 server. Thus DMX enables users to run unmodified X11 applications on PowerWalls that are driven by linux clusters.
Dotkit Dotkit is a set of shell scripts, small "package" files (dotkits), and an organizing plan to help you set up, modify, maintain, and understand a working UNIX environment, for one person or for an entire site.
Hopper Hopper is a powerful interactive tool that allows users to graphically move, copy, find, delete, and otherwise operate on files. Users can connect to and manipulate resources using FTP, SSH, SFTP, HTAR, Endeavor (NFT), and other protocols.
IOR The IOR software is used for benchmarking parallel file systems using POSIX, MPI-IO, or HDF5 interfaces. It is available under the GNU General Public License (GPL).
LaunchMON LaunchMON is a software infrastructure that enables HPC run-time tools to co-locate tool daemons with a parallel job. Its API allows a tool to identify all the remote processes of a parallel job and to scalably launch daemons into the relevant nodes.
Lustre Lustre is a novel storage and file system architecture and implementation project, the target of which is the development of a next-generation cluster file system to serve clusters with 10,000s of nodes, petabytes of storage, move 100s of GB/s with state-of-the-art security and management infrastructure.
mpiP mpiP is a lightweight MPI profiling library that provides time spent in MPI functions by callsite and stacktrace. New run-time functionality can be used to generate mpiP data without relinking through the srun-mpip and poe-mpip scripts on Linux and AIX systems.
PerfTrack PerfTrack is a data store and interface for managing performance data from large-scale parallel applications. Data collected in different locations and formats can be compared and viewed in a single performance analysis session. PerfTrack includes interfaces to the data store and scripts for automatically collecting data describing each experiment, such as build and platform details.
PyMPI The interpreted language, Python, provides a good framework for building scripts and control frameworks. While Python has a (co-routining) thread model, its basic design is not particularly appropriate for parallel programming. The pyMPI extension set is designed to provide parallel operations for Python on distributed, parallel machines using MPI.
Open|SpeedShop Open|SpeedShop is an open source multiplatform performance tool. It supports the analysis of both single node and large scale parallel applications on a variety of platforms, including Linux clusters, Blue Gene/P installations, and Cray XT/XE systems. Open|SpeedShop's base functionality includes metrics like exclusive and inclusive user time, MPI call tracing, and CPU hardware performance counter experiments.
SLURM SLURM is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.
STAT STAT, the Stack Trace Analysis Tool, is a highly scalable, lightweight tool that gathers and merges stack traces from a parallel application.   STAT can identify equivalence classes, which are processes exhibiting similar behavior.  A representative of each equivalence class can then be fed into a full featured debugger for root cause analysis at significantly reduced scale.
Telepath Telepath provides visualization scheduling, session, and resource control. It automates the configuration of all of the resources required for showing high-resolution animations on PowerWalls or other large displays.
Tool Gear Tool Gear is a software infrastructure for developing performance analysis and debugging tools for large scale parallel programs. Many types of tools have similar needs for data collection features and user interfaces. For example, many tools display source code and either annotate it with data or allow the user to click on parts of the source display to initiate an action or display more detailed information. Tool Gear handles much of the source code and data display, allowing tool developers to focus on the unique aspects of their codes. Tool Gear can work with the Dynamic Probe Class Library (DPCL) to collect data from instrumentation installed at run time in programs, or it can receive data sent to it from other sources.
Valgrind Valgrind is a GPL'd system for debugging and profiling x86-Linux programs. Current LC support provides a script, memcheck_all, for running the Valgrind memory debug tool with a parallel application.
VisIt VisIt is an interactive graphical analysis tool for visualizing and analyzing data on two-and three-dimensional (2D, 3D) meshes. It is a general purpose tool that handles many different mesh types and provides different ways of viewing data. It is virtually hardware/vendor independent, while still providing graphics at the speed of the native graphics hardware.