MPI Performance Topics

NOTE: This information pertains to retired LC systems and is being kept for archival purposes only.

Table of Contents

  1. Review of MPI Message Passing
    1. Terminology
    2. MPI Communication Routines
  2. Factors Affecting MPI Performance
  3. Message Buffering
  4. MPI Message Passing Protocols
    1. Eager Protocol
    2. Rendezvous Protocol
    3. Eager Protocol vs. Rendezvous Protocol
  5. Sender-Receiver Synchronization: Polling vs. Interrupt
  6. Message Size
  7. Point-to-Point Communications
  8. Persistent Communications
  9. Collective Communications
  10. Derived Datatypes
  11. Network Contention
  12. IBM SP Specific Factors
  13. References and More Information


Review of MPI Message Passing

Terminology

Latency

The overhead associated with sending a zero-byte message between two MPI tasks. Total latency is a combination of both hardware and software factors, with the software contribution generally being much greater than that of the hardware. It is usually measured in milli/microseconds.

Bandwidth

The rate at which data can be transmitted between two MPI tasks. Like latency, bandwidth is combination of both hardware and software factors. It is usually measured in bytes/megabytes per second.

Application Buffer

The user program address space which holds the data that is to be sent or received. For example, your program uses a variable called, "inmsg". This variable is clearly visible in the program text and able to be managed by the programmer. The application buffer for inmsg is the program memory location where the value of inmsg resides. It is within user address and can be "debugged".

System Buffer

System address space for storing messages, which is not visible to the programmer. Depending upon the type of communication operation, data in the application buffer may be required to be copied to/from system buffer space. The primary purpose of system buffer space is to enable asynchronous communications.

Blocking Communication

A communication routine is blocking if the completion of the call is dependent on certain "events". For sends, the data must be successfully sent or safely copied to system buffer space so that the application buffer that contained the data is available for reuse. For receives, the data must be safely stored in the receive buffer so that it is ready for use.

Non-blocking Communication

A communication routine is non-blocking if the call returns without waiting for any communications events to complete (such as copying of message from user memory to system memory or arrival of message).

It is not safe to modify or use the application buffer after completion of a non-blocking send. It is the programmer's responsibility to insure that the application buffer is free for reuse.

Non-blocking communications are primarily used to overlap computation with communication to effect performance gains.

Synchronous / Asynchronous

A synchronous send operation will complete only after acknowledgement that the message was safely received by the receiving process. Asynchronous send operations may "complete" even though the receiving process has not actually received the message.

Ready Communication

Refers to a send operation in which the programmer has guaranteed a waiting receive has already been posted. It is the programmer's responsibility to insure correctness.

Message Envelope

MPI messages consist of the a "data" portion, and an "envelope" portion. The encoding of the envelope portion is implementation dependent, but typically consists of the message tag, communicator, source, destination, possibly the message length and other implementation specific information.


Review of MPI Message Passing

MPI Communication Routines



Factors Affecting MPI Performance

Platform / Architecture Related:

Network Related:

Application Related:

MPI Implementation Related:



Message Buffering

Implementations Differ:

Advantages:

Disadvantages:

Correctness:

Implementation Notes:



MPI Message Passing Protocols

Two Common Message Passing Protocols:

Implementations Differ:



MPI Message Passing Protocols

Eager Protocol

Potential Advantages:

Potential Disdvantages:

Implementation Notes:



MPI Message Passing Protocols

Rendezvous Protocol

Potential Advantages:

Potential Disdvantages:



MPI Message Passing Protocols

Eager Protocol vs. Rendezvous Protocol



Sender-Receiver Synchronization: Polling vs. Interrupt

Implementation Dependent:

Polling Mode:

Interrupt Mode:

Sometimes Polling is Better:

Sometimes Interrupt is Better:

Implementation Notes:



Message Size



Point-to-Point Communications



Persistent Communications



Collective Communications

Implementation Notes:



Derived Datatypes



Network Contention



IBM SP Specific Factors


Type of Switch and Switch Adapter:

User Space vs. IP Communications:

Communications Network Used:

Number of MPI Tasks on an SMP Node:

Use of MP_SHARED_MEMORY Environment Variable:

Miscellaneous Environment Variables:


This completes the tutorial.

Evaluation Form       Please complete the online evaluation form.

Where would you like to go now?



References and More Information

Notes

1 Timing results were obtained on LLNL's ASCI White system using two IBM SP nodes. Each node was a 16-way SMP, 375 MHz, Nighthawk II with 16 GB of memory, running under AIX 4.3 and PSSP 3.3 software. All communications were conducted over the SP Switch with User Space protocol. Executions were performed in a production batch system. with one MPI task per node.

2 Timing results were obtained on LLNL's ASCI White system using two IBM SP nodes. Each node was a 16-way SMP, 375 MHz, Nighthawk II with 16 GB of memory, running under AIX 4.3 and PSSP 3.3 software. All communications were conducted over the SP Switch with User Space protocol. Executions were performed in a production batch system with 1-16 MPI tasks per node.

3 Timing results were obtained on LLNL's ASCI Blue system using two IBM SP nodes. Each node was a 4-way SMP, 332 MHz, 604e with 1.5 GB of memory, running under AIX 4.3 and PSSP 3.2 software. All communications were conducted over the SP Switch with User Space protocol. Executions were performed in a production batch system.