Spindle: Scalable Shared Library Loading

Software

Download

Spindle 0.9 Tarball

Spindle 0.8 Tarball

Spindle on GitHub 

Change Log

Spindle 0.9

  • Support for OpenMPI’s ORTE-based launchingSupport for the flux resource manager
  • Improve support for co-existing with debuggers
  • Reorganize source tree to separate FE/BE/CLIENT builds
  • Support for building different spindle components with different compilers
  • Add security models for authentication TCP/IP connections
  • New hostbin-based startup for when LaunchMON doesn’t work
  • New spindle man page
  • Fix a crash when dealing with size 0 libraries
  • Fix a race-condition hang on startup

Spindle 0.8

  • New Spindle API allows for manually requesting Spindle semantics of open and stat calls
  • Support for running serial processes under Spindle with the –no-mpi option
  • Support for Spindle interception of stat and lstat calls.
  • Improved Python support, supported by the –python-prefixes option
  • Bug fixes, many focused around following processes through fork/exec

Building and Installing

Spindle depends on the LaunchMON tool, which must be installed on a system before Spindle can be built. 

Spindle is built with autoconf/automake, and accepts many of the traditional options used by automake. See the INSTALL file in the Spindle source distribution for more details.  Here’s an example of the build options used for Spindle at LLNL (bolded text is user input): 

~/spindle% wget https://github.com/hpc/Spindle/archive/v0.8.tar.gz
--2013-05-29 15:03:10-- https://github.com/hpc/Spindle/archive/v0.8.tar.gz
Resolving github.com... 204.232.175.90
Connecting to github.com|204.232.175.90|:443... connected.
...
~/spindle% tar -xzf v0.8.tar.gz
~/spindle% mkdir build
~/spindle% mkdir install
~/spindle% cd build
~/spindle/build% ../Spindle-0.8/configure --prefix=/g/g0/legendre/spindle/install –with-launchmon=$LMON_DIR --with-testrm=slurm
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking target system type... x86_64-unknown-linux-gnu
...
~/spindle/build% make install
Making install in logging
make[1]: Entering directory `/g/g0/legendre/spindle/build/logging'
  CC     libspindlelogc_la-spindle_logc.lo
  CCLD   libspindlelogc.la
...

Some other important options that you may need to pass to Spindle’s configure are:

with-localstorage=DIR Spindle requires a directory local to each node for staging library files.  This directory could be a ramdisk, SSD or other local storage device. Do not point this at a shared file system, which could undo all of Spindle’s performance gains.  You can specify escaped environment variables as part of this option, and these will be evaluated in the user’s environment on the back end.  The default value is ‘$TMPDIR’

with-default-port=NUM Spindle runs a server on each node in the job, which needs to be able to open TCP/IP ports to other nodes in a job.  Spindle will open this connection on the port specified.  The default value is 21940.

with-python-prefix=PATH Specifies a list of directory prefixes where Spindle will find Python libraries (e.g, /usr/lib64/python2.6).

These options can be overridden by the user on the command line.  Other configure options can be found by running configure –help.

You can test a Spindle install by running the testsuite.  From the build/testsuite directory execute the runTests script.

Running Spindle

Once Spindle is installed, it can be run with distributed applications by launching them under Spindle’s control.  This can be as simple as passing your command line to the spindle executable:

% spindle mpirun –n 1024 mpiapp

In this case, the application mpiapp and all its dynamic libraries will be scalably loaded through Spindle

The spindle executable takes several command line options, which can be placed before the mpi launch command and are documented below.  This list can also be found by running spindle –help. 

--reloc-aout=yes|no

If enabled, Spindle will scalably load the application’s executable.  Default is yes.

--follow-fork=yes|no

If enabled, Spindle will track processes that the application forks and scalably load their libraries and other objects.  Default is yes.

--reloc-libs=yes|no

If enabled, Spindle will scalably load dynamic libraries used by the application.  Default is yes.

--reloc-exec=yes|no

If enabled, Spindle will scalably load the  executables that are targets of the exec/execv/execve family of calls.  Default is yes.

--reloc-python=yes|no

If enabled, Spindle will scalably load a Python applications .py and .pyc files.  Default is yes.

--push

Spindle will use a ‘Push’ model of content distribution.  As soon as one process requests a library, that library will immediately be broadcast to all nodes, even if processes on those nodes have not requested the library.

This is option is useful for SPMD codes where every process will eventually load the same libraries.  This mode is enabled by default.

--pull

Spindle will use a ‘Pull’ model of content distribution.  When a library is loaded it will only be sent to the nodes containing processes that  also requested that library. 

This option can save memory for MPMD codes, where different processes may load different libraries. This mode is disabled by default.

--location=DIR

Spindle requires a local directory location on each node where it can stage library files.  This directory should be a ramdisk, SSD, or other storage that is local to the node. 

You can pass escaped environment variables to this option, and Spindle will evaluate them in the back-end environment.  The default value can be optionally set at configure time, and that default is $TMPDIR.

--port=NUM

Spindle opens TCP ports on each node that communicate with other nodes.  This option specifies the port number Spindle should.  If that port is already in use, then Spindle may choose to use another port with-in 25 port numbers of NUM.

The default port can be overridden at configure time.  If unspecified  Spindle will use TCP port 21940 (which means it could use ports from 21940-24965).

--debug=yes|no

If enabled, Spindle will attempt to hide itself from debuggers that may run on the application.  Without this option, Spindle’s library relocation mechanism can interfere with a debugger’s ability to locate libraries that the application is loading. 

Enabling this option forces the reloc-aout option to be disabled.

By default this option is disabled.

--preload=FILE

This option specifies a file containing a whitespace-separated list of libraries that Spindle will broadcast to each node before application execution begins.  If you know that a certain library or file will be needed by every process, then using this option to broadcast the file may improve performance.

--no-hide

By default Spindle will hide itself from the target application, preventing it from seeing or closing the file descriptors it uses. This option disables the hiding functionality.

--no-mpi

By default Spindle assumes it is running an MPI job. This option tells Spindle that it is running a serial job instead.

--noclean

By default Spindle will remove files from the local storage when the last process exits.  If this option is specified then Spindle will leave those files, which may be useful for debugging Spindle.

--python-prefix=path

This option provides a colon-separated list of directory prefixes where Spindle can find Python modules. For example, if python modules are located in /home/user/myapp/pymods/*.pyc, then you could provide /home/user/myapp as a path prefix. This option is not required for Python support, but may improve Spindle’s performance if provided. Spindle can also be given path prefixes at configure time, and this option appends additional path prefixes onto those.

--strip=yes|no

By default Spindle strips unnecessary symbol and debug information from executables and libraries before broadcasting them.  If this option is set to ‘no’, then Spindle will leave that information in.

--disable-logging

Spindle can be optionally configured to write a line to a log file whenever run. This option tells Spindle to run without writing to a log file.

--version

Prints Spindle version information.

Porting and Debugging Spindle

Spindle requires knowledge about the resource manager and how jobs are launched, and as such it needs to be ported to each resource manager.  As of Spindle 0.8, it only understand the SLURM resource manager on Linux/x86_64 as used at LLNL.  Porting Spindle to other Linux systems requires at least:

  • Porting LaunchMON to support your resource manager.  Contact the LaunchMON developers to find out whether LaunchMON is available on your system or what porting would involve.
  • Parsing MPI commands lines.  Spindle needs to be able to recognize the application executable in an MPI command line, the location of the application arguments, and any options that may change the current working directory.  The code that parses MPI command lines can be found in SRCDIR/launchmon/parse_launcher.cc, which will likely need changes to operate on new resource managers.
  • Adding a script to the testsuite that launches jobs for your resource manager.  The SRCDIR/testsuite directory contains a set of run_driver_* scripts for each resource manager.  These scripts take an application name, application arguments and spindle arguments as input, then launch an MPI job under Spindle with those parameters.

If you encounter errors or other problems with Spindle, you can have Spindle generate debugging traces by setting the environment variable SPINDLE_DEBUG to the values of 1, 2 or 3, where 1 produces the least amount of information and 3 produces the most.  A trace file will be generated for each node in the job.