Sequoia BG/Q Exercise

Useful reference while you go through the exercise:

Preparation:

  1. Login to the workshop machine

    We will be using LC's unclassified BG/Q system called vulcan. The instructor will demonstrate how to access vulcan from the classroom workstations.

  2. Create a subdirectory for the BG/Q exercise files and cd to it. Then copy the exercise files:

    mkdir ~/bgq
    cd  bgq
    cp  -R /usr/global/docs/training/blaise/bgq/*   ~/bgq
    

  3. Verify your exercise files

    Issue the ls -l command. Your output should show something like below:

    drwx------   2 class03 class03  4096 Jan 28 14:56 CLOMP-TM
    -rw-------   1 class03 class03  1277 Jan  9 14:20 debug1.c
    -rw-------   1 class03 class03    35 Dec 29 14:17 debug1.dat
    -rw-------   1 class03 class03  1308 Jan  9 14:20 debug1.f
    -rw-------   1 class03 class03  1524 Jan  9 14:20 debug2.c
    -rw-------   1 class03 class03  1865 Jan  9 14:20 debug2.f
    -rw-------   1 class03 class03   796 Jan  7 16:05 mpi_bandwidth.moabScript
    -rw-------   1 class03 class03  5445 Dec 29 14:17 mpi_bandwidth1.c
    -rw-------   1 class03 class03  6542 Dec 29 14:17 mpi_bandwidth1.f
    -rw-------   1 class03 class03  5576 Dec 29 14:17 mpi_bandwidth2.c
    -rw-------   1 class03 class03  6680 Dec 29 14:17 mpi_bandwidth2.f
    -rw-------   1 class03 class03   788 Dec 29 14:17 mpi_hello.c
    -rw-------   1 class03 class03   913 Jan  2 16:16 mpi_hello.f
    -rw-------   1 class03 class03   511 Jan  2 16:12 mpi_hello.moabScript
    -rw-------   1 class03 class03   604 Jan  2 16:13 mpi_hello.moabScript2
    -rw-------   1 class03 class03 10468 Jan  7 15:32 mpi_multibandwidth.c
    -rw-------   1 class03 class03  1735 Jan  7 11:48 mpi_omp.c
    -rw-------   1 class03 class03  2030 Jan  7 11:55 mpi_omp.f
    -rw-------   1 class03 class03   613 Jan  7 12:17 mpi_omp.moabScript
    -rw-------   1 class03 class03   634 Jan  8 16:29 simd.c
    -rw-------   1 class03 class03   683 Jan  8 16:30 simd.f
    drwx------   5 class03 class03  4096 Jan 10 14:02 sphot

Configuration Information:

  1. Before we attempt to actually compile and run anything, let's get familiar with some basic usage and configuration commands. For the most part, these commands can be used on any LC cluster.

  2. Login Node(s):
    Recall that BG/Q login/front-end nodes are different than the compute nodes. Verify this by reviewing the output of the cat /proc/cpuinfo command. What does it tell you?

  3. Compute Nodes and Partitions:
    Use the mjstat command to display a summary of vulcan's batch partition configuration - and also any running jobs. Also try the sinfo -s command to view queue information.

  4. Batch Limits
    Use the news job.lim.vulcan command to review the queue limits for vulcan. Questions:

    Note: this same command can be used on any LC machine to review that machine's queue limits. Just substitute the name of the machine you are logged into for vulcan.

Job Information:

  1. Try each of the following commands, comparing and contrasting them to each other. Consult the man pages if you need more information.

    Command Description
    mjstat
    Partition summary plus one line of detailed information for each running job
    squeue
    One line of detailed information per running job
    showq
    Show all jobs, running, queued, and blocked
    showq -r
    Show only running jobs - note additional details
    showq -i
    Show only non-running, eligible/idle jobs - note additional details
    showq -b 
    Show only blocked jobs, if any
    checkjob jobid
    Using a valid jobid obtained from one of the above commands, get detailed information about that job.

Compilers - What's Available?

  1. Visit the Compilers Currently Installed on LC Platforms webpage.

  2. Look for the IBM BG/Q and seq, rzuseq, vulcan links in the summary table near the top of the page.

  3. Then, click on one of these links to view BG/Q compiler information.

  4. BG/Q compiler commands are also listed in the Compilers section of the Sequoia BG/Q tutorial.

Hello World

  1. Now try to compile your mpi_hello.c or mpi_hello.f file using the appropriate IBM compiler command. If you're not sure which command to use see the previous step above. Be sure to:

  2. After you've successfully compiled your hello world program, check the listing file - it should be named mpi_hello.lst. Some things to observe:

  3. Create a Moab batch script to submit your hello world program. Run it with 16 nodes and 16 tasks per node (256 total). See the Submitting Batch Jobs section of the tutorial or use the provided example file mpi_hello.moabScript. Be sure you understand what this batch script does before continuing.

  4. Submit your Moab job script with the command: msub mpi_hello.moabScript

  5. Monitor your job's progress using one of the Job Information commands shown in the table above, such as squeue, mjstat, showq, etc.

  6. When your job finishes, you should have a file named output.jobid in your bgq directory. Review the file and compare the output to your job script. Is it what you expected? An example output file is available HERE.

  7. Modify your Moab job script so that uses 32 MPI tasks per node (512 total).

  8. Run your job and review the output file when it completes. Is it what you expected? Did you encounter a fatal error and diagnostic message?

  9. If you have questions, see the provided example job script mpi_hello.moabScript2.

MPI with OpenMP Threads:

  1. Review the mpi_omp.c or mpi_omp.f example file which uses both MPI and OpenMP threads. Be sure you understand what is going on before continuing. Ask the instructor if you have any questions.

  2. Compile the example file using the:

  3. Copy and then modify your mpi_hello.moabScript file to run this example. It should do the following: See the provided example file mpi_omp.moabScript if you have questions.

  4. Submit your job script. Monitor its progress and review the output.jobid file when it completes. Does it look like you expected? An example output file is available HERE.

Communications Bandwidth Tests:

  1. Review the C or Fortran mpi_bandwidth1 and mpi_bandwidth2 files. One example uses blocking MPI calls, and the other uses non-blocking MPI calls.

  2. Compile both files naming them differently. Create a Moab job script for your executables which uses 16 nodes and does the following all in the same job script:

    An example job script file is provided as mpi_bandwidth.moabScript. If you use this example file, be sure to change the executable names to match yours.

  3. Submit your job script and then review the output.jobid file after it completes. What do you observe? NOTE: if you grep on the word OVERALL in your output file, you'll see the bottom line results for the 4 pairs of runs. Things to look for:

  4. If you are interested, there is the mpi_multibandwidth.c example, which measures bandwidth between nine different send/receive routine combinations. Sorry - C language example only.

Checking for QPX Floating-point Unit Use:

  1. This section demonstrates how to compile a program for QPX optimization and then verify which loops were simdized.

  2. Compile the C or Fortran simd file using the:

  3. Rename the resulting simd.lst file to listing1.

  4. Now, compile the C or Fortran simd file using:

  5. Rename the resulting simd.lst file to listing2.

  6. Review the two listing files. What to look for:

  7. You can also compare the two listings with the xxdiff utility as follows:
    xxdiff listing1 listing2

Transactional Memory (TM) Examples:

  1. This section uses the LC Sequoia Benchmark code CLOMP-TM, which simulates HPC kernels using TM and OpenMP. Two cases will be run and compared: one with rare memory conflicts between threads, and one with high memory conflicts between threads.

  2. First, cd into your bgq/CLOMP-TM subdirectory. You should see the following files:
    Makefile  README  clomptm.batch.high clomptm.batch.rare  clomp_tm.c  

  3. Due to time constraints, a detailed analysis of this benchmark code isn't possible. The README file provides a brief overview and usage examples.

  4. Build the code by using the command make bgq. It will produce two executables. We will only be using the clomp_tm_bgq_divide4 executable.

  5. Review the batch script clomptm.batch.rare. The comments explain what's going on.

  6. Submit the batch script using the command msub clomptm.batch.rare. If successful, it will return a job id#.

  7. Monitor your job using the squeue or mjstat command. You may want to grep on your job id# to filter out everyone else's jobs.

  8. It should take about 2 minutes for the job to run after it gets started. When finished, it will produce two output files:
    clomptm.out.rare   tm_report.rare.0

    We'll come back to examine them after running the high memory conflicts case.

    For reference, examples for the rare memory conflicts case are provided here:

  9. Now review and then submit your batch script with msub clomptm.batch.high and monitor it. When it completes you will once again have two output files:
    clomptm.out.high   tm_report.high.0

    For reference, examples for the high memory conflicts case are provided here:

  10. Compare the CLOMP-TM output files of the rare and high memory conflicts runs. An easy way to do this is as follows:
    sdiff  clomptm.out.rare  clomptm.out.high > clomptm.out.both
    Then open the clomptm.out.both file in your favorite editor.

    What to look for:

    An example with highlighted lines is provided here:

    Conclusions?

  11. Finally, compare your tm_report.rare.0 and tm_report.high.0 output files. These files record transactional memory statistics for each thread. The most interesting part is at the very end of the report, where the aggregated stats across all 64 threads are shown. For a quick comparison:
    tail -n6 tm_report.rare.0  tm_report.high.0

Core File Debugging:

  1. BG/Q machines produce light-weight core files by default, which aren't of much use with the TotalView debugger. LC and IBM provide several simple tools for examining these core files however. These tools are covered in the Debugging section of the tutorial. You can try one of them in this exercise.

  2. Compile your C or Fortran debug1 file using the:

  3. Create a Moab job script for your executable using 8 nodes, one task per node.

  4. Submit your job script and let it run. When it completes, you should find up to 8 light-weight core files in your directory.

  5. Open one of the core files in your preferred editor. For example: vi core.2

  6. Locate the line in the header section that looks like this:
    While executing instruction at..........0x0000000001000898

  7. Determine the source line for the last executed instruction, which usually points to where the code failed. This can be easily done using the addr2line utility. For example:
    addr2line -e debug1 0x0000000001000898

  8. Review the source file and see if you can figure the reason the code crashed based upon the line number provided by addr2line.

Debugging with TotalView:

  1. TotalView is a very sophisticated debugger, which is covered in detail in the TotalView Tutorial. This exercise only provides a simple example of using it on LC's BG/Q platforms.

  2. Compile your C or Fortran debug2 file using the:

  3. Use LC's mxterm command to acquire a partition of BG/Q compute nodes and to open an xterm window for you to use for debugging purposes. For example, to request 8 nodes for 30 minutes in the special workshop batch queue:
    mxterm 8 0 30  -q pReserved

  4. Assuming that your X11 environment is setup correctly, you will see an xterm window appear when your batch debug partition is ready for you to use. Note that you can monitor the progress of your mxterm job while it is in the queue using the usual batch commands such as showq, squeue, mshow, etc.

  5. In your new xterm window, type the following command:
    totalview srun -a -N8 debug2

  6. TotalView will then start with two windows. In the larger window click on green "Go" button. Image HERE

  7. After a few moments, a small dialog box will ask you about stopping the parallel job. Click "No" to let the job begin and run. Image HERE

  8. When you see output messages appear in your xterm window, it means that the job is running. It will quickly reach a point where it will hang, and output messages will cease.

  9. In the large TotalView window do the following: Image HERE
    1. click on the blue "Halt" button to stop the job so that you can examine it's state.
    2. click on P+ button (lower right corner) to advance to the source code for MPI task 1, where the job is hung.
    3. In the Stack Trace pane, click on the main function to view source code. The yellow arrow shows where the code is hung.

  10. Ordinarily, you would now attempt to figure out why the code is hung and fix the problem. That would require a working familiarity with TotalView though, and is beyond the scope of this exercise.

Performance Tools:

  1. The Performance Tools section of the tutorial covers the available performance tools on LC's BG/Q platforms. Only mpitrace and mpiP, two of the more simple but useful tools, are included in this exercise due to time constraints and learning curves.

  2. mpitrace

    1. Review the mpitrace section of the tutorial for background if needed.

    2. Change directory into the sphot exercise directory

    3. Edit the Makefile to uncomment the LIB_DIRS and LIBS lines near the top of the file needed to link in mpitrace.

    4. Type make clean (to get rid of any old files) and then make to build the sphot code. The sphot code is one of LC's standard benchmark codes. When the code finishes building, confirm that the final lines of output demonstrate linking of the mpitrace libraries.

    5. Review the supplied Moab job script sphot.moabScript. Then submit your job: msub sphot.moabScript

    6. After the job completes (in a minute or so) you should notice several new files in your sphot directory.

    7. The mpi_profile.XXXXX.X files report the MPI profiling information for your job - one each for the tasks having the minimum, median and maximum MPI time. Note that the report for task 0 also includes a summary of MPI statistics for all 64 tasks. Review any/all of these files.

    8. The hpm_process_summary.XXXXX.X files report selected hardware performance counter statistics and derived metrics - one each for the tasks having the minimum, median and maximum MPI time. The hpm_job_summary.XXXXX.X file provides a summary across all 64 tasks. Review any/all of these files.

  3. mpiP

    1. Review the mpiP section of the tutorial for background if needed.

    2. Change directory into the sphot exercise directory (if you are not already there).

    3. Edit the Makefile to uncomment the lines near the top of the file needed to link in the mpiP library. Note: if you previously uncommented the lines for mpitrace, you will need to comment them back now.

    4. Type make clean (to get rid of any old files) and then make to build the sphot code. The sphot code is one of LC's standard benchmark codes. When the code finishes building, confirm that the final lines of output demonstrate linking of the mpiP library.

    5. Review the supplied Moab job script sphot.moabScript. Then submit your job: msub sphot.moabScript

    6. After the job completes you should notice several new files in your sphot directory. The file of interest is the mpiP report, named something like sphot.64.1.1.mpiP.

    7. Review the mpiP report file to see MPI profiling information for your job.

Miscellaneous User Environment Topics:

  1. Most of LC's machines share a common user environment such as global home directories, scratch file space, file systems, commands and utilities, archival storage, etc. These are described in detail in the Introduction to Livermore Computing Resources tutorial. If you're a new user, feel free to explore some of these topics below.

  2. Use the df -h to view available file systems. Several ones of interest:

  3. Other file systems to peruse:

  4. HPSS storage account: provided to all users automatically. Try transferring a couple files there using the usual ftp commands. To connect to storage, simply use the command: ftp storage

  5. List available software packages / modules: use -l (lowercase "L")

BG/Q Related Documentation:

MyLC: Online Machine Status Information and much more:


This completes the exercise.

Evaluation Form       Please complete the online evaluation form if you have not already done so for this tutorial.

Back to the tutorial