Cram: Running Millions of Concurrent MPI Jobs
Cram lets you run many small MPI jobs within a single, large MPI job by splitting MPI_COMM_WORLD up into many small communicators. Other tools have done this for a while, but Cram is the first tool to make it easy.
Cram has command-line and python scripting interfaces that allow you to create “cram files.” Each cram file is packed with an ensemble of jobs, where a job comprises the pieces needed to run a parallel MPI program:
- process count
- working directory
- command line arguments
- environment variables
When you link against libcram (or libfcram for Fortran) and launch your job with a cram file, Cram splits COMM_WORLD and runs each job in the cram file independently.
Cram was created to allow automated test suites to pack more jobs into a BG/Q partition, and to run large ensembles on systems where the scheduler will not scale. On BG/Q, for example, SLURM can run around 20,000 simultaneous jobs. With Cram we¹ve been able to run 1.5 million MPI jobs at once, i.e. one job per core on all of Sequoia. This has been useful so far for running large ensembles of uncertainty quantification jobs. For full documentation, see the GitHub page.
Cram is NOT a job scheduler; it is a simple, lightweight layer between jobs and the MPI runtime. There are no plans to support features like job queues or emulating a resource manager inside an MPI job.
Currently, Cram only handles virtualizing command line arguments for Fortran on BG/Q. It handles C and C++ on all platforms. If you are interested in argument support for Fortran on other platforms, contact Todd Gamblin.