Workshops differ in how this is done. The instructor will go over this beforehand.
In your home directory, create a subdirectory for the LCRM example codes and cd to it. Then copy the example codes:
mkdir ~/lcrm cd lcrm cp /usr/global/docs/training/blaise/lcrm/* ~/lcrm
You should notice the following files:
|lcrm1.cmd||Simple job command script|
|lcrm2.cmd||Another simple job command script|
|lcrm3.cmd||Parallel job command script which runs ep.B.X examples.|
|Executables for lcrm3.cmd. NAS Embarrassingly Parallel benchmark code for 4, 8 and 16 tasks.|
|pengra.cmd||Parallel job command script for submission to Linux cluster|
|mpi_multibandwidth.c||MPI code that demonstrates varying bandwidths for different send/receive pairs.|
Try the following commands to show your default bank, and then current allocation and usage information for that bank. Be sure to substitute your default bank for bank in the second command. See the man pages for defbank and pshare if you have questions.
pshare -p -0 -t bank
The commented example file, lcrm1.cmd should be in your lcrm subdirectory. Review this file, taking note of its #PSUB directives, as well as the shell commands.
When you are ready, use the psub command to submit this job to the LCRM system:
If your job is accepted by LCRM, you should immediately see a message that looks something like:
Job 46847 submitted to batch
You can now use the pstat command to monitor the status of your job. If you are quick, and your job hasn't finished executing already, you should see something similar to below:
JID NAME USER ACCOUNT BANK STATUS EXEHOST CL 13105 lcrm1.cmd class01 000000 cs RUN newberg N
If you see no output from the pstat command, then it probably means your job has already completed. Try repeating the previous psub step and then issuing this command immediately afterwards.
You will know that your job has successfully completed when it no longer shows up with the pstat command, and when you have a new file in your directory called lcrm1.c.o#####, where ##### matches the LCRM job ID assigned to your job upon submission. This is the default naming scheme used by LCRM.
Review the contents of your output file and compare them to the original job command script. Do they agree?
Now that you know the basics, create your own job command script based upon the previous example file. Begin by copying the lcrm1.cmd example file, and then modifying it to do these additional basic tasks:
For assistance, you can check the psub man page. For an example solution, you can check lcrm2.cmd.
After you've created your new job command script, submit it to LCRM. When it completes, verify its output. Note: because this is a batch system, its difficult to predict exactly when your job will run. If your job is sitting in the queue for more than a few minutes, proceed to the next step and come back here later after it has completed.
The example job command script lcrm3.cmd will be used for this part of the exercise. This script runs the EP (embarrassingly parallel) benchmark from the NAS version 2.3 parallel benchmark set. You will be asked to run it three times, using a different number of nodes/tasks, and to evaluate its scalability.
These will generate a lot of diagnostic/informational message both before and after the job completes. These are shown just for demonstration purposes and possibly for your use later when running real jobs.
From the benchmark section, determine the execution time for each job. The easiest way to do this is just grep/search for the string "CPU Time = ".
How scalable is the ep.B.X benchmark code? Do your results come close to those shown below (assuming that they were run on the workshop machine newberg)?
|Sample ep.B.X execution timings|
|SMP Nodes||MPI Tasks||Execution time (sec.)|
Note for the curious: this benchmark has been compiled so that it produces "gprof" profiling information, and generates a gmon*out file for each MPI task. This has nothing to do with LCRM. If you're curious, you can generate a gprof report of the benchmark run to see it's profiling information. For example:
gprof ep.B.16 gmon*out > gprof.16.report*** Ignore the [nllookup] error messages ***
The most interesting information might be the "flat profile" which lists each routine and how much cputime it used. Open your gprof report and just search for "flat profile". Then scroll down a little ways to see this profile information. Again, this has nothing to do with LCRM, but is simply provided for the curious.
This exercise will submit a job to a different system. The mpi_multibandwidth.c code demonstrates how MPI bandwidth can be a function of the types of send/receive calls used.
The tutorial covered several RAC Utilities, such as pshare, bac and lrmusage. Try using these commands with some of the options shown in the tutorial.
This completes the exercise.
|Please complete the online evaluation form if you have not already done so for this tutorial.|
Where would you like to go now?