As processor cores on modern supercomputers have become faster, computations/floating point operations have become faster. However, the network latency and bandwidth have not improved proportionally. In addition to faster cores, increase in the number of cores per node is also stressing the network further. Therefore, the cost of communicating data both on-node and off-node has become a critical factor affecting the overall performance of a parallel application.
The goal of this project is to improve the communication and overall performance of parallel applications using interconnect topology-aware task mapping.
This work has several components:
- Study of inter-job interference
- Characterization of parallel applications
- Tools to profile the communication in parallel applications
- Simulation and modeling of network contention
- Algorithms for topology-aware task mapping
This project was funded by the Laboratory Directed Research and Development Program at LLNL (LDRD) under project tracking code 13-ERD-055 (STATE - Scalable Topology Aware Task Embedding). Collaborators include the University of Illinois at Urbana-Champaign and ETH Zurich, Switzerland.