This section details the input file format for Sphinx and the tests included in this distribution. Sphinx determines parameters for a run by reading and parsing an input file. The parameters for the run determine which tests to perform and variables that control the harness operation, such as the number of iterations per repetition. The input file format consists of several different "modes" that determine what run parameters are being specified. The tests to perform are specified in one or more MEASUREMENTS modes. MEASUREMENTS modes consist of a list of tests to perform; test entries include test-specific parameters. The input file format may seem complex due to the flexibility provided by the test harness. However, experienced users can quickly create the desired input file, due in part to the flexibility of the input file format.

In general, Sphinx input file processing is very forgiving - mode or parameter identifiers must contain a string that uniquely determines them but can contain random other characters, allowing input file processing to complete even with most typos. Further, the determining string is not case specific. Each test in Sphinx has several different possible parameters. A sensible default is used if the input file doesn't specify anything for a given mode or parameter. The default values can be overridden for the entire input file or for a specific test.

The Sphinx input file format is fairly free form; an "@" as the first character of a line changes the input mode and modes can occur in any order. Most of the modes are optional; the only mode that is required is at least one MEASUREMENTS mode. The last occurrence of a mode determines the value for that mode except for the COMMENT and MEASUREMENTS modes; for these two modes, multiple occurrences are concatenated to form a single value. The following table describes the Sphinx input file modes (names listed in ALL CAPS for historic reasons).

Sphinx Input File Modes

MODEDESCRIPTIONDEFAULT
COMMENTany comments that you'd like to include in the input file; this mode can be used to omit test entries without deleting them from the file; not included in output fileNULL
USERa text field with no semantic implications; can be used to provide a short description of the user running the tests; included in output fileNULL
MACHINEa text field with no semantic implications; can be used to provide a short description of the machine on which the tests are run; included in output fileNULL
NETWORKa text field with no semantic implications; can be used to provide a short description of the network on which the tests are run; included in output fileNULL
NODEa text field with no semantic implications; can be used to provide a short description of the nodes of the machine on which the tests are run; included in output fileNULL
MPILIB_NAMEa text field with no semantic implications; can be used to provide a short description of the MPI library with which the tests are run; included in output fileNULL
OUTFILEa text field that specifies the output filenameinput_filename.out
LOGFILEa text field that specifies the log filenameinput_filename.log
CORRECT_FOR_OVERHEADA yes or no text field that specifies whether test results should be corrected for any test harness overhead incurred in the measurement; overhead is generally a function call but depends on the testno
MEMORYan integer field that specifies the size in kilobytes of the buffer to allocate in each task for message passing tests; maximum message lengths are a function of this parameter and the test being run; generally maximum message lengths are equal to this parameter or half of it or this parameter divided by the number of tasks4096 (i.e. 4MB)
MAXREPDEFAULTan integer field that specifies default limit on the number of timings until a test is declared "UNSETTLED"20
MINREPDEFAULTan integer field that specifies default minimum number of timings to average for a test result4
ITERSPERREPDEFAULTan integer field that specifies default number of iterations per timing of the code being measured1
STANDARDDEVIATIONDEFAULTA double field that specifies the default fraction of mean of timings that standard deviation must be less than for test to be declared settled. Sphinx uses standard deviation which may never achieve this mean, unlike SKaMPI which uses standard error which is guaranteed to achieve the percentage for a sufficiently large number of timings; thus MAXREPDEFAULT is more significant for Sphinx0.05
DIMENSIONS_DEFAULTan integer field that specifies default number of independent variables for a test1
VARIATIONa text field that specifies the default independent variable; see below for valid independent variablesNO_VARIATION
VARIATION_LISTa space-delimited text field that specifies the default independent variablesNO_VARIATION for all positions
SCALEa text field that specifies the default scale to use for independent variable; see below for valid scale valuesFIXED_LINEAR
SCALE_LISTa space-delimited text field that specifies the default scalesFIXED_LINEAR for all positions
MAXSTEPSDEFAULTan integer field that specifies default limit on the number of values for independent variables16
MAXSTEPSDEFAULT_LISTa space-delimited integers field that specifies default limits on the numbers of values for independent variables16 for all positions
STARTan integer field that specifies default minimum value to use for independent variables; MIN_ARGUMENT has Sphinx use the minimum value semantically allowed for the independent variable (e.g. 1 for number of tasks)MIN_ARGUMENT
START_LISTa space-delimited integers field that specifies default minimum values to use for independent variablesMIN_ARGUMENT for all positions
ENDan integer field that specifies default maximum value to use for independent variables; MAX_ARGUMENT has Sphinx use the maximum value semantic ally allowed for the independent variable (e.g. size of MPI_COMM_WORLD for number of tasks)MAX_ARGUMENT
END_LISTa space-delimited integers field that specifies default maximum values to use for independent variablesMAX_ARGUMENT for all positions
STEPWIDTHa double field that specifies default distance between independent variable values1.00
STEPWIDTH_LISTa space-delimited doubles field that specifies default distances between the values of independent variables1.00 for all positions
MINDISTSKaMPI artifact; an integer field that apparently was intended to specify a minimum distance between independent variable values; currently has no effect but may be supported in the future1
MINDIST_LISTa space-delimited integers field that specifies MIN_DIST values1 for all positions
MAXDISTSKaMPI artifact; an integer field that apparently was intended to specify a maximum distance between independent variable values; currently has no effect but may be supported in the future (less likely than MINDIST)10 for all positions
MAXDIST_LISTa space-delimited integers field that specifies MAX_DIST values10
MESSAGELENan integer field that specifies default message length in bytes256
MAXOVERLAPan integer field that specifies default maximum iterations of the overlap for loop0
THREADSan integer field that specifies default number of threadsvalue returned by omp_get_max_threads
WORK_FUNCTION_DEFAULTa text field that specifies the default function used inside OpenMP loops; see below for a list of valid work function valuesSIMPLE_WORK
WORK_AMOUNT_DEFAULTan integer field that specifies default duration of function used inside OpenMP loops10
SCHEDULE_DEFAULTa text field that specifies default OpenMP schedule option; see below for a list of valid schedule optionsSTATIC_SCHED
SCHEDULE_CAP_DEFAULTan integer field that specifies default schedule cap for OpenMP tests10
SCHEDULE_CHUNK_DEFAULTan integer field that specifies default OpenMP schedule chunk size1
OVERLAP_FUNCTIONa text field that specifies the default overlap function used in mixed non-blocking MPI/OpenMP tests; see below for valid overlap function valuesSEQUENTIAL
CHUNKSan integer field that specifies default number of chunks6
MEASUREMENTSa text field that determines the actual tests run including any default parameter overrides; see below for a description of the format of this fieldNULL

Many of the modes have list variants, as indicated. These allow the specification of different defaults for independent variables X0, X1, X2,... These variants are needed since Sphinx supports multiple independent variables per test, such as varying both the message size and the number of tasks for a MPI collective test. If a test uses more independent variables than specified in a corresponding list, then the non-list default is used for the additional defaults.

The MEASUREMENTS MODE is a structured text field. It describes the tests that will be run for the input file. The format is a series of test descriptions; blank lines between test descriptions are discarded. A test description consists of a name followed by a left curly brace ({) (optionally on a new line), followed by parameters specific to the test; a right curly brace (}) marks the end of the description. Each parameter field of a test description must be on a separate line; the general format of parameter field line is "parameter_name = value". The following table describes the parameter fields of the test description.

Sphinx Test Description Fields

PARAMETER NAMEDESCRIPTION
TypeType of test; this field determines the actual test run; see below for a description of the different test types available in Sphinx
Correct_for_overheadSee CORRECT_FOR_OVERHEAD mode
Max_RepetitionSee MAXREPDEFAULT mode
Min_RepetitionSee MINREPDEFAULT mode
Standard_DeviationSee STANDARDDEVIATIONDEFAULT mode
DimensionsSee DIMENSIONS_DEFAULT mode
VariationSee VARIATION_LIST mode
ScaleSee SCALE_LIST mode
Max_StepsSee MAXSTEPSDEFAULT_LIST mode
StartSee START_LIST mode
EndSee END_LIST mode
StepwidthSee STEPWIDTH_LIST mode
Min_DistanceSee MINDISTANCE_LIST mode
Max_DistanceSee MAXDISTANCE_LIST mode
Default_Message_lengthSee MESSAGELEN mode
Default_ChunksSee CHUNKS mode

All fields of a test description other than the type field are optional. Test descriptions often consist only of a name, a {, a Type = X line and a }. Properly specified defaults enable this simplicity.

A measurement name is text string without spaces and need not have any relation to the type. This is unfortunate; future versions may augment names in the output file with a type specific string.

If the same name is used for several test descriptions, Sphinx will automatically extend the second and later occurrences with a unique integer. This mechanism ensures that all test descriptions result in a test run.

Sphinx Independent Variable Types

NAMESEMANTIC MEANING
NO_VARIATIONNo independent variable
ITERSNumber of iterations per timing
NODESNumber of MPI tasks
LENGTHMessage length (output format is in bytes)
ROOTRoot task (relevant to asynchronous MPI collective tests)
ACKERTask that sends acknowledgement message (relevant to fan-out MPI collective tests)
OVERLAPComputational overlap time
SECOND_OVERLAPSecond computational overlap time
MASTER_BINDINGCPU to which master (i.e. first/timing) thread is bound
SLAVE_BINDINGCPU to which slave (i.e. second) thread is bound
THREADSNumber of threads
SCHEDULEOpenMP scheduling option
SCHEDULE_CAPIterations per OpenMP THREAD(?)
SCHEDULE_CHUNKOpenMP schedule chunk size option
WORK_FUNCTIONFunction used inside OpenMP loops
WORK_AMOUNTParameter that determines duration of function used inside OpenMP loops
OVERLAP_FUNCTIONOverlap function for mixed non-blocking MPI/OpenMP tests
CHUNKSNumber of chunks for master/worker tests (SKaMPI artifact; use at your own risk)

Some independent variables do not alter anything for some test types. In general, an effort has been made to allow variation of these variables although some may lead to internally detected errors. In any event, care should be taken with independent variable selections to ensure that test descriptions test interesting variations and to ensure that overall run-time is not excessive.

Sphinx Test Types

NUMBERDESCRIPTIONTIMING RESULT
1MPI Ping-pong using MPI_Send and MPI_RecvRound trip latency
2MPI Ping-pong using MPI_Send and MPI_Recv with MPI_ANY_TAGRound trip latency
3MPI Ping-pong using MPI_Send and MPI_IrecvRound trip latency
4MPI Ping-pong using MPI_Send and MPI_Iprobe/MPI_Recv combinationRound trip latency
5MPI Ping-pong using MPI_Ssend and MPI_RecvRound trip latency
6MPI Ping-pong using MPI_Isend and MPI_RecvRound trip latency
7MPI Ping-pong using MPI_Bsend and MPI_RecvRound trip latency
8MPI bidirectional communication using MPI_Sendrecv in both tasksOperation latency
9MPI bidirectional communication using MPI_Sendrecv_replace in both tasksOperation latency
10SKaMPI artifact: master/worker with MPI_WaitsomeNot clear (use at own risk)
11SKaMPI artifact: master/worker with MPI_WaitanyNot clear (use at own risk)
12SKaMPI artifact: master/worker with MPI_Recv with MPI_ANY_SOURCENot clear (use at own risk)
13SKaMPI artifact: master/worker with MPI_SendNot clear (use at own risk)
14SKaMPI artifact: master/worker with MPI_SsendNot clear (use at own risk)
15SKaMPI artifact: master/worker with MPI_IsendNot clear (use at own risk)
16SKaMPI artifact: master/worker with MPI_BsendNot clear (use at own risk)
17Round of MPI_Bcast over all tasksLower bound of operation latency
18Repeated MPI_Barrier calls (provides a reasonable lower bound of operation latency)Per task overhead at task zero
19Round of MPI_Reduce over all tasksLower bound of operation latency
20Repeated MPI_Alltoall calls (provides a reasonable lower bound of operation latency)Per task overhead at task zero
21Repeated MPI_Scan callsPer task overhead at task zero
22Repeated MPI_Comm_split (provides a reasonable lower bound of operation latency) (note: "leaks" MPI_Comm results; future changes will eliminate this problem)Per task overhead at task zero
23Repeated memcpy callsTime per memcpy call
24Repeated MPI_Wtime callsClock overhead
25Repeated MPI_Comm_rank callsTime per MPI_Comm_rank call
26Repeated MPI_Comm_size callsTime per MPI_Comm_size call
27Repeated MPI_Iprobe calls with no message expectedMPI_Iprobe call overhead
28Repeated MPI_Buffer_attach and MPI_BUffer_detachMPI_Buffer_attach/detach call overhead
29Empty function call with point to point patternFunction call overhead
30Empty function call with master/worker patternFunction call overhead
31Empty function call with collective patternFunction call overhead
32Empty function call with simple patternFunction call overhead
33Round of MPI_Gather over all tasksLower bound of operation latency
34Round of MPI_Scatter over all tasksLower bound of operation latency
35Repeated MPI_Allgather calls (provides a reasonable lower bound of operation latency)Per task overhead at task zero
36Repeated MPI_Allreduce calls (provides a reasonable lower bound of operation latency)Per task overhead at task zero
37Round of MPI_Gatherv over all tasksLower bound of operation latency
38Round of MPI_Scatterv over all tasksLower bound of operation latency
39Repeated MPI_Allgatherv calls (provides a reasonable lower bound of operation latency)Per task overhead at task zero
40Repeated MPI_Alltoallv calls (provides a reasonable lower bound of operation latency)Per task overhead at task zero
41Repeated MPI_Reduce_scatter callsPer task overhead at task zero
42Repeated calls to MPI_Bcast, each followed by an MPI_Barrier callUpper bound of operation latency
43Repeated calls to MPI_BcastPer task overhead at root task
44Round of MPI_Bcast over all tasks (identical to type 17)Lower bound of operation latency
45Repeated calls to MPI_Bcast, each followed by an acknowledgement from every other task to root taskUpper bound of operation latency
46Repeated calls to MPI_Bcast, each followed by an acknowledgement from one task to root task; tested over all acknowledgers provides accurate measure of operation latencyOperation latency to acknowledging task
47Repeated calls to MPI_Alltoall, each call followed by a barrier implemented with MPI_Send and MPI_Recv operations (provides a reasonable upper bound of operation latency)Upper bound of operation latency
48Repeated calls to MPI_Gather, each call followed by a broadcast implemented with MPI_Send and MPI_Recv operations (provides a reasonable upper bound of operation latency)Upper bound of operation latency
49Repeated calls to MPI_Scatter, each followed by an acknowledgement from one task to root task; tested over all acknowledgers provides accurate measure of operation latencyOperation latency to acknowledging task
50Repeated calls to MPI_Allgather, each call followed by a barrier implemented with MPI_Send and MPI_Recv operations (provides a reasonable upper bound of operation latency)Upper bound of operation latency
51Repeated calls to MPI_Allreduce, each call followed by a barrier implemented with MPI_Send and MPI_Recv operations (provides a reasonable upper bound of operation latency)Upper bound of operation latency
52Repeated calls to MPI_Gatherv, each call followed by a broadcast implemented with MPI_Send and MPI_Recv operations (provides a reasonable upper bound of operation latency)Upper bound of operation latency
53Repeated calls to MPI_Scatterv, each followed by an acknowledgement from one task to root task; tested over all acknowledgers provides accurate measure of operation latencyOperation latency to acknowledging task
54Repeated calls to MPI_Allgatherv, each call followed by a barrier implemented with MPI_Send and MPI_Recv operations (provides a reasonable upper bound of operation latency)Upper bound of operation latency
56Repeated calls to MPI_Reduce_scatter, each call followed by a barrier implemented with MPI_Send and MPI_Recv operationsUpper bound of operation latency
57Repeated calls to MPI_Alltoallv, each call followed by a barrier implemented with MPI_Send and MPI_Recv operations (provides a reasonable upper bound of operation latency)Upper bound of operation latency
58Repeated calls to MPI_Reduce, each call followed by a broadcast implemented with MPI_Send and MPI_Recv operations (provides a reasonable upper bound of operation latency)Upper bound of operation latency
59Function call with for loop of number of tasks iterations in collective patternOverhead of function call with for loop
60Computation used for non-blocking MPI testsTime of overlap computation
61Overlap of computation with MPI_Isend (not fully tested; use at own risk)Overlap potential of MPI_Isend
62Overlap of computation with MPI_Isend plus overlap of acknowledgement message (not fully tested; use at own risk)Overlap potential of MPI_Isend
63Overlap of computation with MPI_Irecv (not fully tested; use at own risk)Overlap potential of MPI_Irecv
64Repeated MPI_Reduce callsPer task overhead at root task
65Repeated MPI_Gather callsPer task overhead at root task
66Repeated MPI_Gatherv callsPer task overhead at root task
67Repeated MPI_Comm_dup calls (internal difference improves scalability compared to test 69) (note: "leaks" MPI_Comm results; future changes will eliminate this problem)Per task overhead at task zero
68Repeated MPI_Comm_split calls (internal difference improves scalability compared to test 22) (note: "leaks" MPI_Comm results; future changes will eliminate this problem)Per task overhead at task zero
69Repeated MPI_Comm_dup calls (note: "leaks" MPI_Comm results; future changes will eliminate this problem)Per task overhead at task zero
70Repeated calls to MPI_Scan, each followed by an acknowledgement from one task to task zero; tested over all acknowledgers provides accurate measure of operation latencyOperation latency to acknowledging task
71Repeated MPI_Scan calls (internal difference improves scalability compared to test 21)Per task overhead at task zero
101Ping-pong using pthread_cond_signal and pthread_cond_wait"Round trip" latency
102Repeated calls to pthread_cond_signal; as many as the slave thread can wait for are caughtOverhead of pthread_cond_signal
103Repeated uncaught calls to pthread_cond_signalOverhead of pthread_cond_signal
104Repeated calls to pthread_cond_wait; matching calls to pthread_cond_signal are made "as quickly as possible"Overhead of pthread_cond_wait
105Ping-pong using pthread_mutex_lock and pthread_mutex_unlock (four separate locks)"Round trip" latency
106Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (four separate locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
107Repeated pthread_mutex_lock and pthread_mutex_unlock calls (one lock)Overhead of pthread_mutex_lock and pthread_mutex_unlock
108Repeated spin on shared variable; measures per thread time slice when bound to the same CPUPer thread time slice
109Chain of pthread_create calls for detached process scope threadsOverhead of pthread_create
110Repeated calls to sched_yield (thr_yield for Suns); measures thread context switch time when bound to the same CPU (use with care, depends on OS thread scheduling)Thread context switch time
111Repeated pthread_mutex_lock calls (large array of locks) then repeated calls of pthread_mutex_unlock calls (large array of locks); each set of calls is timed separatelyOverhead of pthread_mutex_lock and overhead of pthread_mutex_unlock (separate measurements)
112Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (two separate locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
113Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (three separate locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
114Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (five separate locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
115Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (six separate locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
116Ping-pong using pthread_mutex_lock and pthread_mutex_unlock (array of four locks)"Round trip" latency
117Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (array of four locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
118Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (large array of locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
119Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (seven separate locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
120Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (eight separate locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
121Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (nine separate locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
122Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (ten separate locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
123Repeated interleaved pthread_mutex_lock and pthread_mutex_unlock calls (large array of locks, round robin access order)Overhead of pthread_mutex_lock and pthread_mutex_unlock
124Repeated pthread_mutex_lock and pthread_mutex_unlock calls (one lock, two tight pairs of calls)Overhead of pthread_mutex_lock and pthread_mutex_unlock
125Repeated pthread_mutex_lock and pthread_mutex_unlock calls (one lock, three tight pairs of calls)Overhead of pthread_mutex_lock and pthread_mutex_unlock
126Repeated pthread_mutex_lock and pthread_mutex_unlock calls (one lock, four tight pairs of calls)Overhead of pthread_mutex_lock and pthread_mutex_unlock
127Repeated calls to sched_yield (thr_yield for Suns); measures thread context switch time when bound to the same CPU (uses two "new" threads; can overcome some scheduling quirks) (use with care, depends on OS thread scheduling)Thread context switch time
128Chain of pthread_create calls for detached system scope threadsOverhead of pthread_create
129Chain of pthread_create calls for undetached process scope threadsOverhead of pthread_create
130Chain of pthread_create calls for undetached system scope threadsOverhead of pthread_create
131Function call with for loop of number of tasks iterations in simple patternOverhead of function call with for loop
201Repeated calls to work function (not fully tested, use at own risk)Reference measurement for OpenMP parallel construct
202Repeated calls to an OpenMP parallel region of work functionOverhead of OpenMP parallel construct
203Repeated calls to for loop over work function (not fully tested, use at own risk)Reference measurement for OpenMP parallel for construct
204Repeated calls to an OpenMP parallel for over work functionOverhead of OpenMP parallel for construct
205Repeated calls to an OpenMP parallel for with variable chunk sizes over work functionOverhead of OpenMP parallel for with variable chunk sizes construct
206Repeated calls to an OpenMP parallel for loop over work function (not fully tested, use at own risk)Reference measurement for OpenMP ordered construct
207Repeated calls to an OpenMP parallel for with ordered clause over work functionOverhead of OpenMP parallel for with ordered clause
208Repeated calls to an OpenMP parallel for with ordered work function callsOverhead of OpenMP ordered construct
209Repeated calls to for loop over work function (not fully tested, use at own risk)Reference measurement for OpenMP single and barrier constructs
210Repeated calls to for loop over work function inside OpenMP single construct (not fully tested, use at own risk)Overhead of OpenMP single construct
211Repeated calls to for loop over work function following an OpenMP barrier construct (not fully tested, use at own risk)Overhead of OpenMP barrier construct
212Repeated calls to for loop over work function, results summed (not fully tested, use at own risk)Reference measurement for OpenMP reduction construct
213Repeated calls to an OpenMP parallel for loop with reduction clause over work function (not fully tested, use at own risk)Overhead of OpenMP reduction construct
214Repeated calls to integer increment and work function (not fully tested, use at own risk)Overhead of OpenMP single construct
215Repeated calls to for loop over integer increment inside an OpenMP atomic construct and work function (not fully tested, use at own risk)Overhead of OpenMP barrier construct
301Repeated calls to a mixed OpenMP/MPI barrier followed by work function call (provides a reasonable lower bound of operation latency)OpenMP-test-style overhead of mixed OpenMP/MPI barrier
302Repeated mixed OpenMP/MPI barrier calls (provides a reasonable lower bound of operation latency)Per task overhead at task zero
303Repeated calls to a mixed OpenMP/MPI reduce across all threads in all tasks followed by work function call (provides a reasonable lower bound of operation latency)OpenMP-test-style overhead of mixed OpenMP/MPI all reduce
304Repeated calls to mixed OpenMP/MPI reduce across all threads in all tasks (provides a reasonable lower bound of operation latency)Per task overhead at task zero
305Repeated calls to mixed OpenMP/MPI reduce across all threads in all tasks (essentially) followed by a mixed OpenMP/generic MPI barrierUpper bound of operation latency
306Computation used for non-blocking MPI mixed with OpenMP testsTime of threaded overlap computation
307Overlap of OpenMP threaded computation with MPI_Isend (not fully tested; use at own risk)Overlap potential of MPI_Isend
308Overlap of OpenMP threaded computation with MPI_Isend plus overlap of acknowledgement message (not fully tested; use at own risk)Overlap potential of MPI_Isend
309Overlap of OpenMP threaded computation with MPI_Irecv (not fully tested; use at own risk)Overlap potential of MPI_Irecv

The preceding table provides only a brief description of the tests and their results. The referenced papers provide further detail. Of course, a complete understanding can only result from careful consideration of the code. For details of the corrections used when the correct for overhead mode is used, consult the code.

Sphinx Independent Variable Scales

NAMEDESCRIPTION
FIXED_LINEARFixed linear scale; use up to MAXTSTEPS values exactly STEPWIDTH apart
DYNAMIC_LINEARDynamic linear scale; use values exactly STEPWIDTH apart, then fill in into either exactly MAXSTEPS values or no "holes"
FIXED_LOGARITHMICFixed logarithmic scale; use up to MAXTSTEPS values "logarithmically" exactly STEPWIDTH apart
DYNAMIC_LOGARITHMICDynamic linear scale; use values "logarithmically" exactly STEPWIDTH apart, then fill in into either exactly MAXSTEPS values or no "holes"

Linear scales are reasonably intuitive; a STEPWIDTH of the square root of 2 will result in doubling the previous value with a fixed logarithmic scale. The default stepwidth actually varies with the scale since a STEPWIDTH of 1.00 would not result in any variation with logarithmic scales. The default is the ssquare root of two with logarithmic scales.

Sphinx Work Functions

NAMEDESCRIPTION
SIMPLE_WORKA simple for loop of WORK_AMOUNT iterations, each iteration has a single FMA plus a few branch statements based on mod tests and possibly an integer shift
BORS_WORKComplex set of array operations; duration per WORK_AMOUNT unit is relatively long
SPIN_TIMED_WORKLoop over checks to see if work function has lasted WORK_AMOUNT nanoseconds
SLEEP_TIMED_WORKLoop over checks to see if work function has lasted WORK_AMOUNT nanoseconds followed by sleep and usleep of remaining time

The effect of varying work functions should be limited to cache effects. A future paper will present results addressing the validity of this expectation.

Sphinx Schedule Values

VALUEDESCRIPTION
STATIC_SCHEDstatic
DYNAMIC_SCHEDstatic
GUIDED_SCHEDstatic

The standard OpenMP names for the scheduling options are actually sufficient since Sphinx uses a non-case specific minimum string mechanism to determine the value. Support for the OpenMP runtime schedule option may be added in the future.

Sphinx Overlap Function Values

VALUEDESCRIPTION
SEQUENTIALsequential work function (i.e. not in an OpenMP parallel region)
PARALLELwork function inside OpenMP parallel region
PARALLEL_FORwork function inside OpenMP parallel for construct
PARALLEL_FOR_CHUNKSwork function inside OpenMP parallel for construct with variable chunks

The log mechanism inherited from SKaMPI allows multiple runs of the same input file to run the full set of test descriptions to completion. If the log file contains and end of run message, then the log file and output file are moved to file names extended with an integer and a new run of the full set is started. Sphinx includes corrections to some bugs in this mechanism. These corrections ensure that each test description is run to completion exactly once, regardless of its name or the status of partial runs.