This document describes SLURM MPI selection plugins and the API that defines them. It is intended as a resource to programmers wishing to write their own SLURM node selection plugins. This is version 0 of the API.
SLURM mpi selection plugins are SLURM plugins that implement the which version of mpi is used during execution of the new SLURM job. API described herein. They are intended to provide a mechanism for both selecting mpi versions for pending jobs and performing any mpi-specific tasks for job launch or termination. The plugins must conform to the SLURM Plugin API with the following specifications:
const char plugin_type
The major type must be "mpi." The minor type can be any recognizable abbreviation for the type of node selection algorithm. We recommend, for example:
- lamFor use with LAM MPI and Open MPI.
- mpich-gmFor use with Myrinet.
- mvapichFor use with Infiniband.
- noneFor use with most other versions of MPI.
The plugin_name and plugin_version symbols required by the SLURM Plugin API require no specialization for node selection support. Note carefully, however, the versioning discussion below.
A simplified flow of logic follows:
srun is able to specify the correct mpi to use. with --mpi=MPITYPE
which will set up the correct environment for the specified mpi.
slurmd daemon runs
mpi_p_init((slurmd_job_t *)job, (int)rank);
which will set configure the slurmd to use the correct mpi as well to interact with the srun.
These functions are expected to read and/or modify data structures directly in the slurmd daemon's and srun memory. Slurmd is a multi-threaded program with independent read and write locks on each data structure type. Therefore the type of operations permitted on various data structures is identified for each function.
The following functions must appear. Functions which are not implemented should be stubbed.
int mpi_p_init (slurmd_job_t *job, int rank);
Description: Used by slurmd to configure the slurmd's environment to that of the correct mpi.
job (input) Pointer to the slurmd_job that is running. Cannot be NULL.
rank (input) Primarily there for MVAPICH. Used to send the rank fo the mpirun job. This can be 0 if no rank information is needed for the mpi type.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR.
int mpi_p_thr_create (srun_job_t *job);
Description: Used by srun to spawn the thread for the mpi processes. Most all the real processing happens here.
Arguments: job (input) Pointer to the srun_job that is running. Cannot be NULL.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return -1.
int mpi_p_single_task ();
Description: Tells the system whether or not multiple tasks can run at the same time
Returns: false if multiple tasks can run and true if only a single task can run at one time.
Description: Cleans up anything that needs cleaning up after execution.
Returns: SLURM_SUCCESS if successful. On failure, the plugin should return SLURM_ERROR, causing slurmctld to exit.
This document describes version 0 of the SLURM node selection API. Future releases of SLURM may revise this API. A node selection plugin conveys its ability to implement a particular API version using the mechanism outlined for SLURM plugins. In addition, the credential is transmitted along with the version number of the plugin that transmitted it. It is at the discretion of the plugin author whether to maintain data format compatibility across different versions of the plugin.
Last modified 11 April 2006