Parallel Computations

Parallel Computation - using the VAMPIR performance tool

Introduction

VAMPIR is a trace collection library and visualization tool for programs that use MPI communication, either hand-written or as generated by a compiler or preprocessor. As with all trace collection tools, time stamped records of the program's state are collected at runtime. For more information on VAMPIR, it is recommended that you read the Parallel Computation - using the VAMPIR tool on Data Star .
The modules for the stress problem are written in FORTRAN 77 using the MPI communication library. The pstress program will be run on the Data Star, by following the instructions given below.
If you are interested in more information regarding the use of the Data Star, it's recommended that you read the Data Star User Guide.

How to start

These programs can be run in the Linux environment. To start, you have two options:
- To use a Linux machine. You have to start a terminal.
- To use the Windows machines. You have to use the "SSH Secure Shell" program. To start SSH, start from
Start-->Programs-->Network Applications-->SSH--->Secure Shell Client
Press "Quick Connect" button. Type "linux" as Host Name and your CADLAB login name, then click Connect. The password required is your CADLAB password. This will make a secure connection to one of the Linux machines in Engineering I. Now you are on your account(i.e. X:/ drive) via a Unix interface.

Login on the Data Star

To login on the Data Star you must use the secure shell command, in the LINUX environment:
        % ssh -lUserID dslogin.sdsc.edu      [to make a secure connection to the server]
Once you login the first time, you must change your password:
        % rlogin tfpasswd          [to make a connection to the server where passwords are changed]
        and enter your choice as p to change password. 
After entering the new password, this server will log you out automatically. It might take a while before your new password is activated.

Transfer the files from the HPSS [High Performance Storage System] to the Data Star

To transfer the pstress modules and auxiliary files to the Data Star, you have to use pftp. pftp is an ftp-like interface to HPSS. The followings links and text below describe how to use the pftp utility:
        % pftp                     [to access HPSS]
    pftp> cd /users/csb/u4078      [to change directory to Stefan's account]
    pftp> get stressvamp.tar           [to transfer the file from HPSS to your account on Data Star]  
    pftp> quit                     [to exit pftp access]
        %   

Compiling FORTRAN programs on the Data Star

The file transfered is a tar file. A tar file is a file created for archiving purposes. It is one file composed of several files. In order to compile and run the pstress module, you have to untar the file stressvamp.tar:
        %tar xvf stressvamp.tar        [to extract the contents of the tar file]
In the directory stressvamp, the following files will be present (change to the stressvamp directory by "cd" and use "ls" to list the contents of the directory)
        - Makefile
        - stressh.f
        - stressn.f
        - inpmesh.f
        - meshgen.f
The files Makefile, stressh.f and stressn.f are the files related to the parallel FEM solver program pstress. The auxilary file inpmesh.f is used to create the input to the mesh generator for the dam problem. meshgen.f is used to create the mesh for the dam structure.
To make these Fortran codes work, we first have to compile them. In order to compile the pstress module, run "make":
        % make                           [to compile stressh.f and stressn.f and combine to form executable "pstress"]
In order to compile the inpmesh.f module:
        % f77 -o inputgiver inpmesh.f    [to compile inpmesh.f to form executable "inputgiver"] 
In order to compile the meshgen.f module:
        % f77 -o mesh meshgen.f          [to compile meshgen.f to form executable "mesh"] 
After this the new executable files (marked with *) should be available in the directory.

Running programs on the Data Star

To run the parallel "pstress" module you must be logged on into the so-called interactive nodes:
        % ssh -lUserID dspoe.sdsc.edu       [to make a secure connection to the server]  
        % cd stressvamp                [to change the directory where the programs are stored]
These are the nodes that have been set up for interactive use. There are 4 interactive nodes. Each has 8 POWER3 CPUs and 2 GB of memory. Interactive access to the nodes is shared - at times, there may be more than one user job running on a node, for this, there may be a significant run-time variability.
To solve the dam problem and obtain a plot of the output, we have 5 steps to perform:
Step 1: Create the input for mesh generator
The code in charge of creating the input for the mesh generator is inpmesh.f and inputgiver is the executable version of it. (You can open a file with the editor pico by typing %pico name_of_file.) To run the executable type its name:
        % inputgiver                       [to execute]
You will be prompted for some input. First, the number of nodes in the i direction. This specifies the number of nodes that will cover the width of the dam. Second, the number of nodes in the j direction. This specifies the number of nodes that will cover the heigth of the dam.
Once these are completed, the program will automatically create the input file for mesh generator inputdamprog.
Step 2: Generate mesh for the dam structure
The code in charge of generating the mesh is meshgen.f and mesh is the executable version of it. We have already specified the number of nodes in the previous step. The mesh generator program will distribute nodes evenly and create triangular elements. For this, type:
        % mesh outmesh      [to execute with "inputdamprog" as input file and "outmesh" as output file]  
(Here if file outmesh already exists, you will get an error message. Remove the file first by %rm outmesh) Note, you will not be prompted for input or output file names. They are specified using the command line. File outmesh contains the information on nodes and elements created. Automatically created inputstress file contains similar information as outmesh, however, in a format that is readily available for use in the parallel pstress module. inputstress also contains the hydrostatic forces on the nodes at the dam-water boundary calculated automatically.
Step 3: Solve the Finite Element problem with parallel FEM solver
The parallel FEM solver code is the combination of stressh.f and stressn.f and pstress is the executable version of it.
To run the pstress module, we are going to use the poe command. The poe command invokes the Parallel Operating Environment for loading and executing programs on remote processor nodes. The operation of POE is influenced by a number of POE environment variables. Here are the explanations of the environmental variables that we are going to use, obtained by typing man poe. Each flag (stated with -) is connected to the environmental variable starting with MP_ and followed by the name of the flag and sets them temporarily during the execution:
     
     MP_NODES
     Specifies the number of physical nodes on which to run the parallel
     tasks. It may be used alone or in conjunction with MP_TASKS_PER_NODE
     and/or MP_PROCS. The value of this environment variable can be 
     overridden using the -nodes flag.

     MP_TASKS_PER_NODE
     Specifies the number of tasks to be run on each of the physical nodes.
     It may be used in conjunction with MP_NODES and/or MP_PROCS,
     but may not be used alone. The value of this environment variable 
     can be overridden using the -tasks_per_node flag.    

     MP_RMPOOL
     Determines the name or number of the pool that should be used for
     non-specific node allocation. This environment variable/command-line
     flag only applies to LoadLeveler. Valid values are any identifying
     pool name or number. There is no default. The value of this
     environment variable can be overridden using the -rmpool flag.

     MP_EUILIB
     Determines the communication subsystem library implementation to use
     for communication - either the IP communication subsystem or the US
     communication subsystem. In order to use the US communication
     subsystem, you must have an SP system configured with its high
     performance switch feature. Valid, case-sensitive, values are ip (for
     the IP communication subsystem) or us (for the US communication
     subsystem). The value of this environment variable can be overridden
     using the -euilib flag.
     
     MP_EUIDEVICE
     Determines the adapter set to use for message passing. Valid values
     are en0 (for Ethernet), fi0 (for FDDI), tr0 (for token-ring), css0
     (for the SP system's high performance switch feature), and csss (for
     the SP switch 2 high performance adapter).
The number of processors used are equal to the product of number_of_nodes and tasks_per_node . We can run the run the parallel program by typing:
        % poe pstress -nodes number_of_nodes -tasks_per_node number_of_tasks -rmpool 1 -euilib ip -euidevice en0
where: number_of_processors=number_of_nodes*tasks_per_node and can be one of: 4, 8, 16, or 32; For more nodes (i.e. 64), the batch environment should be used. The following combinations are recommended:
4 processors -> -nodes 1 -tasks_per_node 4
8 processors -> -nodes 1 -tasks_per_node 8
16 processors -> -nodes 2 -tasks_per_node 8
32 processors -> -nodes 4 -tasks_per_node 8

You will be prompted for input data. The name of the input data file is inputstress and give the name outstress to the name of the output file. The file outstress will be created containing the results of the finite element analysis. Also, the file pstress.bpv will be created with the same information and which can be visualized with VAMPIR.
Step 4: Starting the VAMPIR session
After the program has finished executing, start a VAMPIR session:
         Set the DISPLAY variable to the name of the machine you are logged into
         [for example "linux9"]:
             % setenv DISPLAY linux9.engr.ucsb.edu:0.0
         Enter:
             % vampir etch.bpv
The VAMPIR Main window and the Global Timeline Display window open.
Step 5: Open views to visualize the trace records
To do this, click on the Global Displays from the main window a nd open:
 
         * Summary Chart
         * Activity Chart
 
You can select any other views. To interpret the information these windows prese nt, see the VAMPIR User Guide.
Step 6: View the trace file.
We'll start by viewing the time for the entire run. Zoom in on a section of the timeline:
    * Click and drag over a portion of the timeline with the left mouse.
      This part will be magnified.
    * Continue zooming until most of the MPI function names are revealed.
Step 7: View process statistics for the selected portion of the timeline.
From the Global Display menu, select the Summary Chart view. A new view will open. Press the right mouse button within this window.
     * Select the Use Timeline Portion.
     * Scroll the timeline, using the scroll bar at the bottom of the timeline
       window, and watch what happens in both displays.
Step 8: View the "Activity Chart".

The "Activity Chart" display shows a statistic about the time spent in each activity individually for each process defined in the tracefile. With the default pie chart display you can recognize load imbalance at a glance in the trace program by comparing the different time consumptions of the activities over all processes.

Step 8: End the VAMPIR session.
To do this, select:
        File --> Exit