ompP

From Peyton Hall Documentation
Jump to navigation Jump to search

ompP is an OpenMP profiler. OpenMP is a shared memory parallel processing software included as part of the Intel compilers. It can also be used with a number of other compilers not listed here. As of July 2008, ompP is compatible with the following compilers:

gcc 4.2.0

icc/ifort

xlc/xlC/xlf

Pathscale

PGI

This list was obtained from http://www.cs/utk/edu/~karl/ompp.html where further details can be found on compatibility.


OMPP can be very useful because, in addition to pinpointing which functions take up a great deal of runtime, it can look at how much time is spent by each thread in either a parallel or user-defined region.

Download and Install:

To download OMPP, go to http://www.cs.utk.edu/~karl/ompp.html. There you will find the download as well as a link to the user's manual. Follow all of the installation instructions in the manual. If you put the program in your installs folder and instruct your shell to search through that folder when given a non-builtin command, you can use the profiler like this:

Compile:

If you normally compile a program, foo.c, like this:

icc foo.c -o foo

Then just add 'kinst-ompp' to the beginning:

kinst-ompp icc foo.c -o foo

Doing so will add a few extra files to the directory that foo.c lives in. They will look something like foo.mod.c, foo.opari.inc, opari.rc, opari.tab.c, and opari.tab.o. This is normal, but be careful when compiling several times with giant makefiles because the presence of these extra files can screw it up in some cases. Just remove the extra files with the command:

rm *.mod.c *opari*

or adjust your make clean to take care of it.

Run:

A profile of your program will not be written until you run it. To do this, run it as you would any other time. If you run foo with the argument 2 like ./foo 2 then do the exact same thing when you want to profile it.


After the program has terminated, the file foo.n-m.ompp.txt will appear in the same directory as foo. n and m are variables here:

n = maximum number of threads you set

m = the trial number

If you run the same version of foo 4 times with a max of 3 threads you will generate a total of 4 files:

foo.3-0.ompp.txt

foo.3-1.ompp.txt

foo.3-2.ompp.txt

foo.3-3.ompp.txt

This is because OMPP does not, by default, replace the last profile report with a new one. You will have to remove them yourself if you do not want to accumulate multiple reports.

Analyze:

Open the profile report with any text editor. You will see a list of parallel and user-defined regions. Parallel regions are regions which you have declared parallel using OpenMP's parallel directives. This also includes the for, single, barrier etc. directives. User-defined regions are blocks of code that you have staked out in the code with OMPP's directives. An easy way to do this is to use #pragma pomp inst begin(section1) and #pragma pomp inst end(section1) together:

Example Code:

void horrible_slow_plodding_function(int number) {

 some code
 some more code
 #pragma pomp inst begin(section1)
 
  section1's code
 #pragma pomp inst end(section1)
 maybe a little more code
 return;

}

The profile will tell you, among other things, how much time section1 took by itself and how much it took together with other regions that might be nested in it. An example: you define a region which includes a block of parallelized code or you could define a region within a region. This also applies to an OpenMP loop directive which appears within an OpenMP parallel block. The profiler breaks the regions and how they are nested in each other down in several useful ways, very similarly to how gprof does with serial programs. For a better understanding of the profile, please take a look at the user's manual, it is really very helpful.

A special and useful aspect of the profile report is its section on overhead and imbalance in all of your program's parallel regions. A high imbalance among the threads means that they are not doing an equal amount of work. Shifting the workload so that it is better distributed will further optimize your program's performance. Imbalance is one of the big contributors to overhead, which makes your use of parallel processing less advantageous than it could be. This section is, therefore, one of the parts of the profile to really pay attention to. It appears near the bottom of the report.

Note(s) of caution:

(1) Adding user-defined regions to functions that are called over and over by other functions will add significant overhead of its own. Try to avoid this if you can, it can really affect the runtime of your program and therefore the findings of the report. If the function is called within a for loop, for example, it might be a good idea to simply make the loop itself a user-defined region instead.

(2) OMPP DOES NOT, as of yet, SUPPORT NESTED PARALLELISM. Trying to run such a program with OMPP will cause it to abort.

I will try to add more to this page soon.