TotalView 4.1.0-1-LLNL

Release Features

September 19, 2000

TotalView is a product of Etnus Inc.

This documents the release of TotalView 4.1.0-1-LLNL installed in totalview and totalviewcli on sky/white/blue/snow/baby and forest/sc/compass/tc/tc2k September 19, 2000 to fix the problem of setting data watchpoints in debugging parallel code. newtotalview and newtotalviewcli were updated to 4.1.0-1-LLNL on August 22, 2000.

Below are the features added to create TotalView 4.1.0-1-LLNL.

§ Complete list of Etnus release features for 4.1
§ Bulk Server Launch capabilites default on AIX
§ Attaching to a subset of executing processes
§ Watchpoints now available on Compaq Alpha and on IBM Power 3 (snow/white)
§ Hardware Performance Monitoring for IBM Power 3 (snow/white)
§ Condensing process information in CLI: ST -condense
§ Allow vector subscrips (gather) in CLI
§ Display stack frame in CLI: dwhere -stackframe -registers
§ LLNL defined Tcl script to display information across processes in CLI: dlaminate, dfind
§ Display memory utilization in CLI: dmemsize
§ Previous LLNL/LANL Enhancements to TotalView
§ TotalView V4.1.0-3-LLNL Release Features

Documentation: TotalView User's Guide (HTML and PDF) and TotalView CLI Guide (HTML and PDF).

Tutorial: TotalView Debugger by Blaise Barney

TotalView Quick Reference Page

TotalView CLI Summary Sheet

For problems/questions, send e-mail to Bor Chan, Karen Warren, Rich Zwakenberg (LLNL) or Laurie McGavran (LANL).

Bulk Server Launch capabilities default on AIX

When debugging a parallel code on IBM, TotalView needs to launch its debugger server, tvdsvr, in each of the nodes running the processes in order to control, retrieve, and set information in the process. With this release, TotalView can now launch its server tvdsvr in two ways.

With this release we have made the second alternative the default. This alternative launches the tvdsvr server on each node where the users processes are running using poe and is called the Bulk Server Launch. We have made the Bulk Server Launch alternative the default since we have experienced it works and works faster with large numbers of cpus (like 1024).

To use the previous default rsh form, do either of the following:

Attaching to a subset of executing processes

By default, TotalView has always attached to all of the users processes in the job. This was done under the control of TotalView at start up or via attaching to an already executing job. TotalView now allows you to attach to a subset of your executing processes.

The ability to attach to a subset of executing processes is to eliminate the long start up and execution time of TotalView controlling all of your processes. You may know that the problem is in process 312 of 1024, and you only want to see that one. Or it may be in processes 255, 511, 767, and 1023, because you know that is where you are doing MPI communication.

When attaching TotalView to an executing job, TotalView now displays the list of all the executing processes, allows you to select the processes you want to attach to, and than allows you to control, retrieve and set information in the subset of executing processes selected. Note: You do not have any control of the executing processes you are not attached to.

Adding executing processes to the attached to subset is allowed. If the subset you have attached to does not contain the executing process you want to look at, you can add other executing processes to the attached subset. It is quite often the fact that after looking at one process, it leads you to wanting to look at some other process. You can do this by just attaching to more processes as you debug.

There are two ways to attach to a subset of executing processes:

Attaching using the window interface

See Figure 1 below.

Attaching using the CLI

See Figure 2 below.

Figure 1

Figure 2

Watchpoints now available on Compaq Alpha and on IBM Power 3 (snow/white)

You can now define a watchpoint on any of the Compaq Alphas (compass, tc, tc2k, forest, sc) and on the IBM Power 3 (snow/white).

During the execution of your program, if the variable(s) (really data address(s)) being watched is modified, your code will halt and totalview will point to the line doing a modification to the data address.

Limitations on IBM
On the IBM Power 3 there are limitations to the watchpoint capability:

Limitations on Alpha
Watchpoints are implemented on the Alpha Tru64 system using Alpha's page protection scheme. Tru64 places no limitations on the number of watchpoints that you can create and there are no alignment or size constraints. However, since watchpoints are implemented using the page protection mechanism, to watch something even as small as one byte, other write references to the page causes false hits and TotalView needs to deal with them. This can result in very slow execution times on the Alphas.

Unconditional Data Watchpoint vs Conditional Data Watchpoint
Unconditional Data Watchpoints use available hardware and thus run fast, especially on IBM. Conditional Data Watchpoints are always interpretive and thus run very slow, degrading performance.

Process to define a watchpoint
Watchpoint is a command in the variable display window. Here is the process to define a watchpoint:

Example Let's say you have a C variable float *b which is dimensioned 10 and something is clobbering b[4]. The following is what to do:

Condensing process information in CLI: ST -condense

We have added a -condense option to dstatus to condense status output, similar to condensed root window in a previous release. Using dstatus -condense doesn't do anything, but when used in the context of dfocus g dstatus -condense, you will see a condensing of the process list. Contiguous processes at the same state and PC will be listed on one line.

dfocus g dstatus -condense is long hand for ST -condense.

Allow vector subscipts (gather) in CLI

We added the ability to print vector subscipted arrays while in the CLI. This is the same feature for the GUI TotalView described in Displaying an Array with a Vector Subscript.

Display stack frame in CLI: dwhere -stackframe -registers

We have added the ability to display the stack frame from the dwhere CLI command via options -args, -locals, -registers, and -stackframe. The use of -stackframe is a short hand way of using the two options -args and -locals.

Below, in the first picture, is the CLI output of dwhere -stackframe. The second picture shows what you would see in TotalView's window interface stack frame pane.

LLNL defined Tcl script to display information across processes in CLI: dlaminate, dfind

We have created a Tcl script that defines functions (commands) to allow displaying information across processes from within the CLI. This script is included in the TotalView startup. The new commands are:

Display memory utilization in CLI: dmemsize

We have added the dmemsize command to the CLI to print memory utilization information.

dmemsize prints the memory utilization for each process in the current focus group. Virtual size of data, executable file size, text resident size, real memory (resident set) size, and percent real memory usage are printed. The max and min of all processes for real memory size are also printed.

Hardware Performance Monitoring for IBM Power 3 (snow/white)

Hardware Performance Monitoring on the IBM Power 2 chip allowed for 4 counters. The new IBM Power 3 chip, currently available on snow and white, allows for 8 counters. TotalView has been modified to allow for 8 counters and all the new events allowed on the 8 counters.

LLNL Disclaimers

Last revised September 19, 2000