Livermore Computing Resources and Environment Exercise

Exercise 1

Login to an LC Cluster:
  1. This step will vary, depending upon what type of machine you are using. Follow the relevant instructions for your type of machine on the instructor provided handout.

  2. After logging in, review the login banner. Specifically notice the various sections:
    • Welcome section - where to get more info, help
    • Announcements - All LC Machines and local machine related
    • Any unread news items - trying reading a news item: news news_item

  3. Verify that your X11 environment is setup correctly by launching a simple xclient. Type the command xclock. Did a small clock appear on your screen - similar to the image at right?
    • If not, try to resolve the problem before proceeding - see the instructor provided handout for help and/or ask the instructor.

Copy the Exercise Files

  1. Use the command below to copy the exercise files into your home directory (note the tilde "~" character):
    cp -R /usr/global/docs/training/blaise/linux_clusters/  ~   

  2. Verify your exercise files using the ls -l linux_clusters command. Your output should show something like below:

    drwx------ 2 class07 class07 4096 Jan 15 07:31 benchmarks
    -rw------- 1 class07 class07  108 Jan 15 07:35 hello.c
    -rw------- 1 class07 class07   67 Jan 15 07:35 hello.f
    drwx------ 4 class07 class07 4096 Jan 15 07:31 mpi
    drwx------ 4 class07 class07 4096 Jan 15 07:31 openMP
    drwx------ 2 class07 class07 4096 Jan 15 07:31 pthreads
    drwx------ 4 class07 class07 4096 Jan 15 07:31 serial
    

Configuration Information:

  1. Login Nodes

    1. What is the name of your login node? What other login nodes are available?
      Use the nodeattr -c login command to display the all login nodes this cluster.

    2. Login nodes are shared between users. Use the who command to see who else is logged into your login node.
      Recall that the generic cluster login rotates between login nodes to balance user load across available nodes.

    3. From your current login node, try logging into another login node, using ssh

  2. Compute Nodes and Partitions

    1. Use the sinfo -s command to display a summary of this cluster's configuration.
      • What compute partitions are available?
      • How many nodes are in each partition? Allocated? Idle? Other/unavailable?
      • Which nodes are in each partition?

    2. Now try the sinfo command (no flags). Note that its output is similar to the sinfo -s command, but provides more detail by breaking out nodes according to their "state".

    3. Try the command news job.lim.machinename where machinename is the actual name of your cluster.
      • What type of hardware comprises the cluster?
      • What are the partition job size and time limits?
      • Note any "good neighbor" policies

    4. Try the command mxterm utility to obtain interactive access to a batch compute node. Note: This part requires that your X11 forwarding is functioning properly. If it isn't, then skip it.
      For example: mxterm 1 1 30 -q pReserved to request 1 node, 1 process for 30 minutes.
      • After a minute or two, an xterm window will appear, giving you an interactive session on a batch compute node
      • Check the node name - is it a pReserved node? Compare it to the output of the sinfo -s command
      • Use the who command to confirm that you are the only person logged into this node
      • Ordinarily, you would now run a job, start a debugger, etc. But we're done for now with this part, so you can close the xterm and continue in your original terminal window.

  3. Hardware and OS details

    1. Try any/all of the commands below to get hardware and operating system details:
      lscpu
      lstopo
      uname -a
      cat /proc/cpuinfo
      cat /proc/meminfo
      cat /etc/toss-release
      distro_version
      cat /etc/redhat-release
      
    2. Go to hpc.llnl.gov
      • Select "User Portal" (not General Site), and then select: Hardware ==> Compute Platforms
      • Find the name of your login cluster - notice its summary hardware/OS information
      • Click on the name of the cluster to obtain detailed information

    3. Go to mylc.llnl.gov
      • Authenticate using your workshop username + PIN + OTP. See the Workshop Login Instructions sheet provided by the instructor if you have questions.
      • Once logged in, find the "machine status" portlet/sub-window
      • Click on the name of the workshop cluster you are logged into
      • Click on the various tabs and review the information shown.

File Systems:

  1. Available file systems: use the df -h or bdf command to view

  2. Scratch file systems:
    • Note the very large, Lustre parallel file system(s) mounted as /p/lscratch*
    • Note the large /nfs/tmp2
    • Note that /p/lscratch* and /nfs/tmp2 are shared by many users - list them using ls -m

  3. Home directory
    • Use the pwd command to find where your home directory is located.
    • Check your quota using the quota -v command.
    • List your "online" backup directories under your invisible .snapshot directory: ls -l .snapshot

  4. File transfer and HPSS storage: there are several ways to transfer files to/from storage - this exercise shows Hopper.
    Note: This part requires that your X11 forwarding is functioning properly. If it isn't, then skip it.
    1. Bring up the Hopper GUI using the hopper command.
    2. In the new hopper window, open the Connect menu and select Connect to Storage (screenshot HERE).
    3. A second hopper window connected to the storage system will appear. (screenshot HERE)
    4. Transfer your linux_clusters directory to storage by "dragging and dropping" it from the first hopper window to the storage hopper window. When you release your mouse button, a hopper menu will pop-up - select Copy Here. (screenshot HERE)
    5. A third hopper window showing the file transfer in action will appear. (screenshot HERE)
    6. After the third hopper window disappears, confirm that the directory was transferred to storage - open the folder in storage to examine the files it contains. (screenshot HERE)
    7. Feel free to explore hopper further if you'd like. When done, close all hopper windows.

  5. File system status
    1. Go to https://lc.llnl.gov/fsstatus/fsstatus.cgi
    2. Authenticate using your workshop username + PIN + OTP. See the Workshop Login Instructions sheet provided by the instructor if you have questions.
    3. Review the CZ File Systems Status page information


This completes Exercise 1



Exercise 2

  1. Still logged into the workshop cluster?

    If so, then continue to the next step. If not, then login as you did previously for Exercise 1.

Job Information:

  1. Try each of the following commands, comparing and contrasting them to each other. Consult the man pages if you need more information.

    Command Description
    sinfo -s
    Concise summary of queues and node states
    mjstat
    Partition summary plus one line of detailed information for each running job
    squeue
    One line of detailed information per running job
    showq
    Show all jobs, running, queued and blocked
    showq -r
    Show only running jobs - note additional details
    showq -i 
    Show only queued, eligible/idle - note additional details
    showq -b
    Show only blocked jobs

Compilers - What's Available?

  1. Use the commands below to display available compilers on the cluster you're logged into. You should see GNU, Intel, PGI and Clang compilers - several versions of each. Try any/all of these commands to narrow your search to a specific compiler:

    Quartz Cab
    module avail gcc
    module avail intel
    module avail pgi
    module avail clang
    
    use -l gcc
    use -l icc
    use -l pgi
    use -l clang
    

Hello World

  1. Go to your linux_clusters directory:
    cd linux_clusters
  2. Now try to compile your serial hello.c and/or hello.f files with any/all of the available compilers.
    Note: for the PGI and Clang compilers:
    • On Quartz you'll need to module load packagename the desired PGI or Clang compiler first.
    • On Cab you'll need to use packagename for the clang compiler first.

    Example compiles:

    C:
    icc hello.c -o hello
    gcc hello.c -o hello
    pgcc hello.c -o hello
    clang hello.c -o hello
    Fortran:
    ifort hello.f -o hello
    gfortran hello.f -o hello
    pgf90 hello.f -o hello
    

  3. After you've successfully built your hello world, execute it. Did it work?

Building and Running Serial Applications:

  1. Go to either the C or Fortran versions of the serial codes:
    cd serial/c
       or 
    cd serial/fortran
  2. Try your hand at compiling and executing any/all of the ser_* codes with any/all of the compilers available.

  3. Notes:
    • If using gcc, you will need the -lm flag for two C examples: ser_prime.c and ser_wave.c.
    • Fortran - use ifort, pgf90 or gfortran (not F77 flavors)
    • Consult the compiler man page(s) for any compiler flags you'd like to try
    • A Makefile has been provided for convenience - if you use it, feel free to edit the choice of compiler/flags it uses.

Compiler Optimizations:

  1. Compilers differ in their ability to optimize code. They also differ in their default level of optimization, as demonstrated by this exercise.

  2. Review the optimize code and the opttest script so that you understand what's going on.

  3. Execute opttest. When it completes, compare the various timings.
    • Which compiler performed best/worst without optimization?
    • Which compiler performed best/worst with -O3?
    • Which compiler had the least/greatest difference between no opt and -O3?

  4. The Intel and PGI compilers perform some optimizations by default; the GNU compilers do not. To see the effects of this, modify the opttest file to remove all occurrences of -O0 and rerun the test.

    Note: if you try both C and Fortran, the result differences are due to loop index variables - C starts at 0 and Fortran at 1.

Building and Running Parallel MPI Applications:

  1. MPI is covered in the MPI tutorial later in the workshop. This part of the exercise simply shows how to compile and run codes using MPI.

  2. Go to either the C or Fortran versions of the MPI applications:
    cd ~/linux_clusters/mpi/c
       or
    cd ~/linux_clusters/mpi/fortran

  3. Using the default MPI library, compile any/all of the mpi_* codes. For example:
    mpicc mpi_array.c -o array
    mpif90 mpi_array.f -o array

    A Makefile has been provided for convenience - if you use it, feel free to edit the choice of compiler/flags it uses.

    INTERACTIVE RUNS:

  4. There is a special partition setup for the workshop: pReserved. Use this partition for all exercises.

  5. Run any/all of the codes directly using srun in the pReserved partition. For example:
    srun -n4 -ppReserved mpi_array
    srun -N2 -ppReserved mpi_latency
    srun -N4 -n16 -ppReserved mpi_bandwidth
    NOTE: For interactive runs, if there aren't enough nodes available, your job will queue for awhile before it runs. The typical informational message looks something like below:

    
    srun: job 68821 queued and waiting for resources
    

    BATCH RUNS:

    NOTE:This part of the exercise is trivial - it is simply shows how to submit and monitor a batch job. The batch system is covered in depth later during the Slurm and Moab tutorial.

  6. From the same directory that you ran your MPI codes interactively, open the job_script file (Quartz) or the msub_script file (Cab) in a UNIX editor, such as vi/vim, emacs, nedit, gedit, nano...

  7. Review this very simple job script. The comments explain most of what's going on.

  8. Submit your script to the batch system. For example:
    Quartz: sbatch job_script
    Cab: msub msub_script

  9. Monitor the job's status by using the command:
    showq | grep classXX
    where XX matches your workshop username/token. The sleep command in the script should allow you enough time to do so.

  10. After you are convinced that your job has completed, review the batch log file. It should be named something like output.NNNNN.

Building and Running Parallel OpenMP Applications:

  1. OpenMP is covered in the OpenMP tutorial later in the workshop. This part of the exercise simply shows how to compile and run codes using OpenMP.

  2. Depending upon your preference for C or Fortan:
    cd ~/linux_clusters/openMP/c/
    -or-
    cd ~/linux_clusters/openMP/fortran/

    You will see several OpenMP codes.

  3. If you are already familar with OpenMP, you can review the files to see what is intended.

  4. Compiling with OpenMP is easy: just add the required flag to your compile command.
    Note: for the PGI and Clang compilers, you'll need to module load the desired compiler package first.

    Compiler Flag
    Intel -qopenmp
    GNU -fopenmp
    PGI -mp
    Clang -fopenmp

    For example:

    icc -qopenmp omp_hello.c -o hello
    -or-
    ifort -qopenmp omp_reduction.f -o reduction

  5. Compile any/all of the example codes.

  6. Before running, set the OMP_NUM_THREADS environment variable to the number of threads that should be used. For example:
    setenv OMP_NUM_THREADS 8
  7. To run, just enter the name of the executable.

Run a Parallel Benchmark:

  1. Run the STREAM memory bandwidth benchmark:

    1. cd ~/linux_clusters/benchmarks

    2. Depending on whether you like C or Fortran, compile the code. Note: the executable needs to be named something other than stream, as this conflicts with /usr/bin/stream, an unrelated utility.

      C
      icc -O3 -qopenmp stream.c -o streambench
      Fortran
      icc -O3 -DUNDERSCORE -c mysecond.c
      ifort -O3 -qopenmp stream.f mysecond.o  -o streambench

    3. This benchmark uses OpenMP threads, so set OMP_NUM_THREADS - for example:
      setenv OMP_NUM_THREADS 8
    4. Then run the code on a single node in the workshop queue:
      srun -n1 -ppReserved streambench
    5. Note the bandwidths/timings when it completes.

    6. For more information on this benchmark, see http://www.cs.virginia.edu/stream/

Hyper-threading:

    LC's Intel clusters have hyper-threading turned on by default. To confirm this:

  1. Use the lstopo --only core command to show how many physical cores are on the system you're using. What does the output tell you?

  2. Then use the lscpu command. Look for the line that lists "CPU(s):" - near the top of the output. What does the output tell you this time?

  3. Also use the cat /proc/cpuinfo command. How many "processors" does it report?

  4. When hyper-threading is turned on, it appears that there are twice as many cpus as physical cores.

  5. Moral of the story: the performance benefits of using hyper-threads will vary by application. Try your real applications both with and without hyper-threading to see which perform best.

Online Machine Status Information...and More:

  1. Go to hpc.llnl.gov. Make sure you select User Portal and not "General Site".

    1. Under the Hardware drop-down menu, select CZ Compute Platform Status.

    2. You will then be taken to the "LC OCF CZ Machines Status" matrix. Find one of the Linux cluster machines and note what info is displayed.

    3. Now actually click on the hyperlinked name of that machine and you will be taken to lots of additional information about it, including links to yet more information, which you can follow if you like.

    4. Go back to hpc.llnl.gov. Select User Portal again. Then under the Hardware drop-down menu select CZ File System Status. This will take you to a matrix showing details about CZ file systems.

      This page is particularly useful for checking the up/down status of important file systems.

    5. Notice that hpc.llnl.gov hosts much more than machine status information. In fact, it's LC's primary user documentation portal.


  1. Now go to mylc.llnl.gov. It will open a new tab/window so you can follow along with the rest of the instructions.

    1. If prompted, authenticate using your workshop username + PIN + OTP. See the Workshop Login Instructions sheet provided by the instructor if you have questions.

    2. The MyLC portal displays a wealth of information pertaining to LC systems.

    3. Take some time to explore this information. Much of it is interactive, allowing you to dive into additional detail.

    4. For example, go to the my accounts container, and click on a machine name, such as quartz. Notice the multi-tab window that appears with details on the state of the machine.


This completes the exercise.

Evaluation Form       Please complete the online evaluation form if you have not already done so for this tutorial.

Where would you like to go now?