Privacy and Legal Notice





Building Executables

Environment Variables

SMT and

Performance Results

Open Issues, Gotchas, and Recent Changes





A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

1X An InfiniBand interface width. 1X defines an interface with two differential pairs, one transmit, one receive. Provides 2.5 Gbit/s full-duplex connections.
4X An InfiniBand interface width. 4X defines an interface with eight differential pairs (four per direction), four transmit, four receive. Provides 10 Gbit/s full-duplex connections.
4X DDR Double data rate InfiniBand interface. 4X defines an interface with eight differential pairs (four per direction). Providing 5.0 Gbits/s per differential pair for 20 Gbit/s full-duplex.
12X An InfiniBand interface width. 12X defines an interface with 24 differential pairs (12 per direction), 12 transmit, 12 receive. Provides 30 Gbit/s full-duplex connections.
12X DDR Double data rate InfiniBand interface. 12X defines an interface with 24 differential pairs (4 per direction). Providing 5.0 Gbits/s per differential pair for 60 Gbit/s full-duplex.
AMPI Adaptive MPI. An MPI implementation developed at the University of Illinois that can exploit virtual processor techniques.
API Application programmer's interface. Syntax and semantics for invoking services from within an executing application. All APIs shall be available to both Fortran and C programs, although implementation issues (such as whether the Fortran routines are simply wrappers for calling C routines) are up to the supplier.
ARMCI Aggregate remote memory copy interface. A one-sided communication library that provides an extensive set of RMA. See
BLAB Aggregate Bidirectional Link Bandwidth. BLAB is defined as the minimum of the aggregate memory bandwidth, aggregate bus bandwidth, or the sum bidirectional link peak user payload data bandwidth.
blocking operation An operation that does not complete until the operation either succeeds or fails. For example, a blocking receive will not return until a message is received or until the channel is closed and no further messages can be received.
A jointly funded research partnership between IBM and the Lawrence Livermore National Laboratory as part of the U.S. Department of Energy ASC Advanced Architecture Research Program. Application performance and scaling studies have recentlybeen initiated with partners at a number of academic and government institutions, including the San Diego Supercomputer Center and the California Institute of Technology. This massively parallel system of 65,536 nodes is based on a new architecture that exploits system-on-a-chip technology to deliver target peak processing power of 360 teraFLOPS (trillion floating-point operations per second). The machine is scheduled to be operational in the 2004-2005 time frame at price/performance and power consumption/performance targets unobtainable withconventional architectures.  See
broadcast operation A communication operation in which one processor sends (or broadcasts) a message to all other processors.
buffer A portion of storage used to hold input or output data temporarily.
Charm++ A parallel C++ library developed at the University of Illinois.
Clos A network topology named after its inventor Charles Clos.
cluster A set of SMPs connected via a scalable network technology.  The network shall support high bandwidth, low latency message passing.  It shall also support remote memory referencing.
collective communication A communication operation that involves more than two processes or tasks. Broadcasts, reductions, and the MPI_Allreduce subroutine are all examples of collective communication operations. All tasks in a communicator must participate.
communicator An MPI object that describes the communication context and an associated group of processes.
critical path The serial chain of dependencies that most limits forward progress.
DDR Double data rate
device driver Software to function the host channel adapter devices on a node.
DHCP Dynamic host configuration protocol. DHCP enables individual computers on an IP network to extract their configurations from a server (the 'DHCP server') or servers, in particular, servers that have no exact information about the individual computers until they request the information. DHCP is often used to reduce the work necessary to administer a large network (e.g., managing IP addresses).
DPCL Dynamic Probe Class Library
fairness A policy in which tasks, threads, or processes must be allowed eventual access to a resource for which they are competing. For example, if multiple threads are simultaneously seeking a lock, no set of circumstances can cause any thread to wait indefinitely for access to the lock.
FT Fault tolerance or fault tolerant
GiB gibibyte. Gibibyte is a billion base 2 bytes. This is typically used in terms of random access memory and is 230 (or 1,073,741,824) bytes.
GB gigabyte. Gigabyte is a billion base 10 bytes. This is typically used in every context except for random access memory size and is 109 (or 1,000,000,000) bytes.
GPL General Public License. A legal software license arrangement developed by GNU to promote open software. The licenses for most software are designed to prevent  users from sharing or changing it. By contrast, the GNU General Public License is intended to guarantee the freedom to share and change free software to ensure the software is free for all its users. The GPL is designed to make sure that anyone can distribute copies of free software (and charge for this service if they wish); that they receive source code or can get it if they want; that they can change the software or use pieces of it in new free programs; and that they know they can do these things. The GPL forbids anyone to deny others these rights or to ask them to surrender the rights. These restrictions translate to certain responsibilities for those who distribute copies of the software or modify it.
GUI Graphical user interface. A type of computer interface consisting of a visual metaphor of a real-world scene, often of a desktop. Within that scene are icons, which represent actual objects, that the user can access and manipulate with a pointing device.
HCA Host channel adapter.  IBA expansion card that interfaces the IBA interconnect to the cluster node I/O subsystem.
HEC High-end computing
hot spot A memory location or synchronization resource for which multiple processors compete excessively. This competition can cause a disproportionately large performance degradation when one processor that seeks the resource blocks, preventing many other processors from having it, thereby forcing them to become idle.
HPC ULPs High performance computing upper layer protocols. HPC ULPs include MPI, IPoIB, SDP, and Sandia Portals.
HT HyperTransport is an I/O link. With clock speeds of up to 1.4 GHz and DDR signaling, HyperTransport technology provides an effective throughput of 2.8 gigatransfers per pin-pair on a 32-bit link. This results in a maximum aggregrate throughput of 22.4 gigabytes per second, per link. (See
IBA InfiniBand architecture
IBTA InfiniBand Trade Association (See
InfiniBand access layer Includes the user-mode components for management services, SM query, connection management, and work request processing, and the kernel mode components for InfiniBand PnP, management services, resource management, connection  management, work request processing, and user-level proxy agent.
IPoIB Internet protocol over InfiniBand. IP specifies the format of packets (also called datagrams) and the addressing scheme.
iSCSI Internet SCSI (Small Computer System Interface). An IP-based storage networking standard for linking data storage facilities, developed by the Internet Engineering Task Force (IETF). By carrying SCSI commands over IP networks, iSCSI is used to facilitate data transfers over intranets and to manage storage over long distances.
iSER iSCSI extensions for RDMA
kDAPL Kernel Direct Access Programming Library defines a single set of kernel-level APIs for all RDMA-capable transports. The kDAPL mission is to define a transport-independent and platform standard set of APIs that exploits RDMA capabilities, such as those present in IB, VI, and iWARP.
kernel modules Changes to the Linux kernel needed to support the rest of the OpenIB software stack.
LAB Aggregate link bandwidth. LAB is defined as the minimum of the aggregate memory bandwidth, aggregate bus bandwidth, or the sum of unidirectional link peak user payload data bandwidth.
LAPI Low-level application programming interface. An active-message-type API for optimal communication through the IBM SP switch. Provides reliable, unordered communication between all processes in the MPI world.
latency The time interval between the instant at which an instruction control unit initiates a call for data transmission, and the instant at which the actual transfer of data (or receipt of data at the remote end) begins. Latency is related to the hardware characteristics of the system and to the different layers of software that are involved in initiating the task of packing and transmitting the data.
LVDS Low voltage differential signaling. An electrical spec (EIA-644) used by InfiniBand. LVDS is designed with an output voltage swing of 350 mV at better then 400 Mbps into a 100 ohm load, across a distance of about 10 meters.
MB Megabyte is a million base 106 bytes.  This is typically used in every context except for random access memory size and is 106 (or 1,000,000) bytes.
MiB Mebibyte is a million base 2 bytes.  This is typically used in terms of Random Access Memory and is 220 (or 1,048,576) bytes.
MPI Message passing interface. An industry standard, message-passing protocol that typically uses a two-sided send-receive model to transfer messages between processes.
MPI-2 Extensions to the MPI standard.
MPI I/O An MPI extension allowing for the manipulation of files on different file systems.
MR Mandatory requirement. Mandatory requirements are items that are essential to the University and reflect the minimum qualifications an offeror must meet in order to have their proposal evaluated further for selection.
MTBF A measurement of the expected reliability of the system or component.  The MTBF figure can be developed as the result of intensive testing, based on actual product experience, or predicted by analyzing known factors.
NIC Network interface card. An expansion board you insert into a computer so the computer can be connected to a network. Most NICs are designed for a particular type of network, protocol, and media, although some can serve multiple networks.
nonblocking operation An operation, such as sending or receiving a message, that returns immediately whether or not the operation was completed. For example, a nonblocking receive will not wait until a message is sent, but a blocking receive will wait. A nonblocking receive will return a status value that indicates whether or not a message was received.
open source Software products provided under an open source license(s) found at
parallelism The degree to which parts of a program may be concurrently executed.
PCI Express A dual-simplex, point-to-point serial differential low-voltage peripheral component interconnect. Previously known as 3GIO and Arapahoe. PCI Express allows a bandwidth up to 500 MB/s duplex for each link, 8 GB/s for sixteen lanes (x16).
PCI-X A follow-on initiative to PCI (peripheral component interconnect). PCI-X allows a bandwidth up to 1 GB/s for 64 bit bus running at 133 MHz. [Note that we distinguish PCI-X and PCI Express.]
PERUSE MPI performance examination and revealing unexposed state extension specification; the specified API.
PMPI Profiling interface for MPI specified by the MPI standard.
Portals (Sandia Portals) Low-level API providing reliable and ordered communication for Lustre.( See
POSIX Portable Operating System Interface. A set of IEEE standards designed to provide application portability between UNIX variants. IEEE 1003.1 defines a UNIX-like operating system interface, IEEE 1003.2 defines the shell and utilities and IEEE 1003.4 defines real-time extensions.
QDR Quad data rate
RC Reliable connection
RDMA Remote direct memory access. RDMA capability allows processes executing on one node of a cluster to be able to "directly" access (execute reads or writes against) the memory of processes within the same user job executing on a different node of the cluster.
reduction operation An operation, usually mathematical, that reduces a collection of data by one or more dimensions. For example, the arithmetic SUM operation is a reduction operation which reduces an array to a scalar value. Other reduction operations include MAXVAL and MINVAL.
RHEL Red Hat Enterprise Linux
RMA Remote memory access. A user-level communication protocol that provides ability for a task to access memory of another task by the use of put/get operations.
RTS Run-time system
Scalability Tested on 4,096 node physical fabrics and scaling properties simulated up to 16,384 nodes.
SDP Sockets direct protocol. SDP is an IBA-specific protocol defined by the Software Working Group (SWG) of the IBA. The SDP specification maintains traditional sockets SOCK STREAM semantics as commonly implemented over TCP/IP, as well as support for byte-streaming over a message passing protocol, including kernel bypass data transfers and zero-copy data transfers.
SDSM Software-based distributed shared memory
SMP Shared memory multiprocessor. A set of CPUs sharing random access memory within the same memory address space.  The CPUs are connected via a high speed, low latency mechanism to the set of hierarchical memory components.  The memory hierarchy consists of at least processor registers, cache and memory.  The cache shall also be hierarchical. If there are multiple caches, they shall be kept coherent automatically by the hardware. The main memory may be a non-uniform memory access (NUMA) architecture.  The access mechanism to every memory element shall be the same from every processor.  More specifically, all memory operations are done with load/store instructions issued by the CPU to move data to/from registers from/to the memory.  A single SMP may be partitioned into one or more nodes.
SOW Statement of work
SPMD Single program multiple data
synchronization The action of forcing certain points in the execution sequences of two or more asynchronous procedures to coincide in time.
Test harness and modules Software to automatically test the functionality, performance, reliability, and robustness of the components of the OpenIB software stack.
TF Teraflop. A measure of the peak computing power of a machine in 1012 floating point operations per second.
thread A single, separately dispatchable, unit of execution. There may be one or more threads in a process, and each thread is executed by the operating system concurrently.
TLP Thread level parallelism
UD Unreliable datagram
uDAPL User Direct Access Programming Library defines a single set of user-level APIs for all RDMA-capable Transports. The uDAPL mission is to define a transport-independent and platform standard set of APIs that exploits RDMA capabilities, such as those present in IB, VI, and RDDDP WG of IETF.
ULP Upper layer protocols. APIs for applications to perform IB communications operations.  For instance, MPI-2, IPoIB, SDP, and Sandia Portals.
UPC Unified Parallel C. An extension of the C programming language designed for high-performance computing on large-scale parallel machines.The language provides a uniform programming model for both shared and distributed memory hardware. The programmer is presented with a single shared, partitioned address space, where variables may be directly readand written by any processor, but each variable is physically associated with a single processor. UPC uses a SPMD model of computation in which the amount of parallelism is fixedat program startup time, typically with a single thread of execution per processor.
URDMA Unacknowledged, unreliable RDMA capability.
UTR University technical representative
VAPI InfiniBand verbs applications programming interface
VP Virtual processor. Used in the context of assigning multiple "virtual" processors to each of physical processors.

High Performance Computing at LLNL    Lawrence Livermore National Laboratory

Last modified September 7, 2006