Dawn BG/P Exercise

Useful reference while you go through the exercise:

Preparation:

  1. Login to the workshop machine

    We will be using LC's unclassified BG/P system called dawndev. The instructor will demonstrate how to access dawndev from the classroom workstations.

  2. Create a subdirectory for the Dawn BG/P exercise files and cd to it. Then copy the exercise files:

    mkdir ~/bgp
    cd  bgp
    cp  -R /usr/global/docs/training/blaise/bgp/*   ~/bgp
    

  3. Verify your exercise files

    Issue the ls -l command. Your output should show something like below:

    -rw-------  1 blaise blaise 1085 2010-03-11 13:47 debug1.c
    -rw-------  1 blaise blaise   35 2010-03-10 14:09 debug1.dat
    -rw-------  1 blaise blaise 1124 2010-03-10 14:15 debug1.f
    -rw-------  1 blaise blaise 1524 2010-03-10 15:31 debug2.c
    -rw-------  1 blaise blaise 1865 2010-03-10 15:31 debug2.f
    -rw-------  1 blaise blaise  546 2010-03-11 13:46 doublehummer.c
    -rw-------  1 blaise blaise  567 2010-03-10 12:51 doublehummer.f
    -rw-------  1 blaise blaise  570 2010-03-16 12:29 misalign.c
    -rw-------  1 blaise blaise  601 2010-03-16 12:35 misalign.moabScript
    -rw-------  1 blaise blaise 5444 2010-03-10 12:18 mpi_bandwidth1.c
    -rw-------  1 blaise blaise 6541 2010-03-10 12:19 mpi_bandwidth1.f
    -rw-------  1 blaise blaise 5575 2010-03-10 12:20 mpi_bandwidth2.c
    -rw-------  1 blaise blaise 6679 2010-03-10 12:19 mpi_bandwidth2.f
    -rw-------  1 blaise blaise  646 2010-03-10 12:17 mpi_bandwidth.moabScript
    -rw-------  1 blaise blaise  788 2010-03-05 13:17 mpi_hello.c
    -rw-------  1 blaise blaise  913 2010-03-05 13:16 mpi_hello.f
    -rw-------  1 blaise blaise  649 2010-03-09 09:27 mpi_hello.moabScript
    -rw-------  1 blaise blaise  585 2010-03-09 08:51 mpi_hello.moabScript2
    -rw-------  1 blaise blaise 1733 2010-03-09 15:59 mpi_omp.c
    -rw-------  1 blaise blaise 2027 2010-03-09 16:01 mpi_omp.f
    -rw-------  1 blaise blaise  600 2010-03-09 13:57 mpi_omp.moabScript
    -rw-------  1 blaise blaise 1362 2010-03-16 12:51 ser_array.c
    -rw-------  1 blaise blaise 1360 2010-03-16 12:48 ser_array.f
    drwx------  5 blaise blaise 4096 2010-03-16 12:03 sphot

Configuration Information:

  1. Before we attempt to actually compile and run anything, let's get familiar with some basic usage and configuration commands. For the most part, these commands can be used on any LC cluster.

  2. Login Node(s):
    Recall that BG/P login/front-end nodes are different than the compute nodes. Verify this by reviewing the output of the cat /proc/cpuinfo command. What does it tell you?

  3. Compute Nodes and Partitions:
    Use the mjstat command to display a summary of dawndev's batch partition configuration - and also any running jobs. Note that on dawndev, there is only one partition - on dawn however, there are several partitions.

  4. Batch Limits
    Use the news job.lim.dawndev command to review the queue limits for dawndev. Questions:

    Note: this same command can be used on any LC machine to review that machine's queue limits. Just substitute the name of the machine you are logged into for dawndev.

Job Information:

  1. Try each of the following commands, comparing and contrasting them to each other. Consult the man pages if you need more information.
    Note: dawndev is not a production cluster, so the output of these commands will be sparse compared to other LC production machines.

    Command Description
    mjstat
    Partition summary plus one line of detailed information for each running job
    squeue
    One line of detailed information per running job
    showq and/or mshow
    Show all jobs, running, queued, and blocked
    showq -r
    Show only running jobs - note additional details
    showq -i
    Show only non-running, eligible/idle jobs - note additional details
    showq -b 
    Show only blocked jobs, if any
    checkjob jobid
    Using a valid jobid obtained from one of the above commands, get detailed information about that job.

Compilers - What's Available?

  1. Visit the Compilers Currently Installed on LC Platforms webpage.

  2. Look for "dawn" in the summary table near the top of the page for a quick view of what's installed (default versions).

  3. Then, click on the "dawn" link to see additional detail for Dawn.

  4. Dawn compiler commands are also listed in the Compilers section of the Dawn BG/P tutorial.

Hello World

  1. Now try to compile your mpi_hello.c or mpi_hello.f file using the appropriate IBM compiler command. If you're not sure which command to use see the previous step above. Be sure to:

  2. After you've successfully compiled your hello world program, check the listing file - it should be named mpi_hello.lst. Some things to observe near the top section of the listing:

  3. Create a Moab batch script to submit your hello world program. See the Submitting Batch Jobs section of the tutorial or use the provided example file mpi_hello.moabScript. Be sure you understand what this batch script does before continuing.

  4. Submit your Moab job script with the command: msub mpi_hello.moabScript

  5. Monitor your job's progress using one of the Job Information commands shown in the table above, such as squeue, mjstat, showq, mshow, etc.

  6. When your job finishes, you should have a file named output.jobid in your bgp directory. Review the file and compare the output to your job script. Is it what you expected? An example output file is available HERE.

  7. Modify your Moab job script so that mpirun executes 64 MPI tasks. Note that you'll need to use "virtual node" mode to accomplish this. If you have questions on how to do this, see the Execution Modes section of the tutorial and/or see the provided example job script mpi_hello.moabScript2.

  8. Run your 64 task job and review the output file when it completes. Is it what you expected?

MPI with OpenMP Threads:

  1. Review the mpi_omp.c or mpi_omp.f example file which uses both MPI and OpenMP threads. Be sure you understand what is going on before continuing. Ask the instructor if you have any questions.

  2. Compile the example file using the:

  3. Copy and then modify your mpi_hello.moabScript file to run this example. For your mpirun command, you will need to: See the provided example file mpi_omp.moabScript if you have questions.

  4. Submit your job script. Monitor its progress and review the output.jobid file when it completes. Does it look like you expected? An example output file is available HERE.

  5. Now, modify your job script to pass the OMP_NUM_THREADS environment variable set to 5. Submit the job and examine the resulting output file. What happens? Why?

Communications Bandwidth Tests:

  1. Review the C or Fortran mpi_bandwidth1 and mpi_bandwidth2 files. One example uses blocking MPI calls, and the other uses non-blocking MPI calls.

  2. Compile both files naming them differently. Create a Moab job script for your executables. Note that you can run both executables from the same job script file - one after the other. For your mpirun command, use 16 tasks in smp execution mode. An example job script file is provided as mpi_bandwidth.moabScript. If you use this example file, be sure to change the executable names to match yours.

  3. Submit your job script and then review the output.jobid file after it completes. What do you observe?

  4. Now, modify your job script to run both executables using 32 tasks and dual execution mode. Submit your job and check the output file after it completes? What do you observe this time?

  5. Finally, modify your job script to use 64 tasks and virtual node execution mode. Submit your job and check the output file after it completes? What can you now surmise about MPI communications on BG/P nodes with regard to blocking vs. non-blocking and the number of tasks communicating on a node?
    Discussion available HERE

Checking for Double FPU Use:

  1. Compile the C or Fortran doublehummer file using the:

  2. Rename the resulting doublehummer.lst file to listing1.

  3. Now, compile the C or Fortran doublehummer file using:

  4. Rename the resulting doublehummer.lst file to listing2.

  5. Compare the two listings with the sdiff utility as follows:
    sdiff listing1 listing2

  6. What to look for: instructions that indicate use of the double floating point unit, such as quad-word loads, quad-word stores and FMA dual pipe. These instructions appear in the Aligning Data and SIMD Instructions section of the tutorial. An annotated version of the optimized, double FPU listing is available HERE also, with double FPU instructions highlighted in red.

Alignment Exceptions:

  1. The Aligning Data and SIMD Instructions section of the tutorial discusses the importance of this topic.

  2. The simple exercise code misalign.c demonstrates what happens when data is misaligned on BG/P machines.

  3. Compile the code using bgxlc

  4. Create or modify an existing Moab job script to submit your executable to the workshop BG/P machine, dawndev.

  5. After it completes, review the output file and look for the killed with signal 7 error messages. You will probably notice a number of core files being produced also.

  6. Users can modify the default behavior on how alignment errors are handled by using the BG_MAXALIGNEXP=-1 environment variable. Modify your job script to pass this environment variable to the mpirun command in your script. If you have questions on how to do this, see the supplied misalign.moabScript file.

  7. Run your job again and review the output. What do you notice this time?

Core File Debugging:

  1. BG/P machines produce light-weight core files, which aren't of much use with the TotalView debugger. LC and IBM provide several simple tools for examining these core files however. These tools are covered in the Debugging section of the tutorial. You can try two of them in this exercise.

  2. Compile your C or Fortran debug1 file using the:

  3. Create a Moab job script for your executable using 16 nodes in smp execution mode.

  4. Submit your job script and let it run. When it completes, you should find 16 core files in your directory.

  5. Use the getstack tool to quickly find the (likely) place where the program failed. For example:
    getstack core.12 debug1

  6. Review the source file and see if you can figure out the exact reason based upon the line number provided by getstack.

  7. Another core processor tool that you can try is located at:
    /bgsys/drivers/ppcfloor/tools/coreprocessor/coreprocessor.pl

  8. Assuming that your X11 environment is setup correctly, launch this tool using the command below, being sure to substitute your workshop username for classXX:
    /bgsys/drivers/ppcfloor/tools/coreprocessor/coreprocessor.pl -c=/g/g0/classXX/bgp -b=debug1

  9. If everything is working correctly, the coreprocessor GUI will appear. You can then load and examine multiple core files.
    1. In the smaller of the two windows, click the "Load Cores" button. image HERE.
    2. In the larger window, select Group Mode: Ungrouped w/ Traceback
    3. Select any core file that appears as shown in the image HERE.

  10. A better example of the coreprocessor tool, with nested routine calls, is shown in the tutorial Debugging section.

Debugging with TotalView:

  1. TotalView is a very sophisticated debugger, which is covered in detail in the TotalView Tutorial. This exercise only provides a simple example of using it on LC's BG/P platforms.

  2. Compile your C or Fortran debug2 file using the:

  3. Use LC's mxterm command to acquire a partition of BG/P compute nodes and to open an xterm window for you to use for debugging purposes. For example, to request 16 nodes for 30 minutes:
    mxterm 16 0 30

  4. Assuming that your X11 environment is setup correctly, you will see an xterm window appear when your batch debug partition is ready for you to use. Note that you can monitor the progress of your mxterm job while it is in the queue using the usual batch commands such as showq, squeue, mshow, etc.

  5. In your new xterm window, type the following command:
    totalview mpirun -a -np 16 -mode smp -exe ~/bgp/debug2 -cwd ~/bgp

  6. TotalView will then start with two windows. In the larger window click on green "Go" button. Image HERE

  7. After a few moments, a small dialog box will ask you about stopping the parallel job. Click "No" to let the job begin and run. Image HERE

  8. When you see output messages appear in your xterm window, it means that the job is running. It will quickly reach a point where it will hang, and output messages will cease.

  9. In the large TotalView window do the following: Image HERE
    1. click on the blue "Halt" button to stop the job so that you can examine it's state.
    2. click on P+ button (lower right corner) to advance to the source code for MPI task 1, where the job is hung.
    3. In the Stack Trace pane, click on the main function to view source code. The yellow arrow shows where the code is hung.

  10. Ordinarily, you would now attempt to figure out why the code is hung and fix the problem. That would require a working familiarity with TotalView though, and is beyond the scope of this exercise.

Performance Tools:

  1. The Performance Tools section of the tutorial covers the available performance tools on LC's Dawn BG/P platform. Only mpitrace and mpiP, two of the more simple but useful tools, are included in this exercise due to time constraints and learning curves.

  2. mpitrace

    1. Review the mpitrace section of the tutorial for background if needed.

    2. Change directory into the sphot exercise directory

    3. Edit the Makefile to uncomment the LIB_DIRS and LIBS lines near the top of the file needed to link in mpitrace.

    4. Type make clean (to get rid of any old files) and then make to build the sphot code. The sphot code is one of LC's standard benchmark codes. When the code finishes building, confirm that the final lines of output demonstrate linking of the mpitrace libraries.

    5. Submit the code using the supplied Moab job script sphot.moabScript

    6. After the job completes you should notice several new files in your sphot directory.

    7. Review the mpi_profile.0 file to see MPI profiling information for your job.
      • The other two mpi_profile.X files are the tasks with the median and maximum MPI communication times. You can review these also if you'd like.

    8. Now review your job's hardware performance counter report by viewing the file named hpm.txt.mpiAll.0.jobid. A few things to note:
      • Default 256 hardware events
      • Counts for cores 0 and 1 only
      • Average, min and max for all MPI tasks
      • You'll need the IBM documentation to understand more about the events counted.

  3. mpiP

    1. Review the mpiP section of the tutorial for background if needed.

    2. Change directory into the sphot exercise directory (if you are not already there).

    3. Edit the Makefile to uncomment the lines near the top of the file needed to link in the mpiP library. Note: if you previously uncommented the lines for mpitrace, you will need to comment them back now.

    4. Type make clean (to get rid of any old files) and then make to build the sphot code. The sphot code is one of LC's standard benchmark codes. When the code finishes building, confirm that the final lines of output demonstrate linking of the mpiP library.

    5. Review the supplied Moab job script sphot.moabScript. Then submit your job: msub sphot.moabScript.

    6. After the job completes you should notice several new files in your sphot directory. The file of interest is the mpiP report, named something like sphot.16.100.1.mpiP.

    7. Review the mpiP report file to see MPI profiling information for your job.

Building and Running Serial HTC Applications:

  1. The Using HTC Mode section of the tutorial covers this topic.

  2. Running serial applications on HTC nodes is simple - you just compile with the appropriate BG/P compiler and launch the job with the submit command.

  3. Compile the ser_array code using bgxlc or bgxlf.

  4. Verify that the following two environment variables are set (should be the default):
    HTC_SUBMIT_POOL=htc
    BG_PGM_LAUNCHER=submit

  5. Run the code on an HTC node by launching it with submit ser_array

  6. The serial job will be automatically routed to an available HTC node for execution, with stdout returning to your front-end login node session.

  7. Note that on the unclassified dawndev machine, there is currently no way to actually monitor your job while it is running, except for any output that it might produce. HTC jobs on the classified Dawn machine can be monitored

Miscellaneous User Environment Topics:

  1. Most of LC's machines share a common user environment such as global home directories, scratch file space, file systems, commands and utilities, archival storage, etc. These are described in detail in the Introduction to Livermore Computing Resources tutorial. If you're a new user, feel free to explore some of these topics below.

  2. Use the df -h to view available file systems. Several ones of interest:

  3. Other file systems to peruse:

  4. HPSS storage account: provided to all users automatically. Try transferring a couple files there using the usual ftp commands. To connect to storage, simply use the command: ftp storage

  5. List available software packages / modules: use -l (lowercase "L")

  6. Documentation - a few examples:

MyLC: Online Machine Status Information and much more:




This completes the exercise.

Evaluation Form       Please complete the online evaluation form if you have not already done so for this tutorial.

Where would you like to go now?