COSC330/530 Parallel and Distributed Computing

Lecture 14 - Interprocess Communication with MPI

Dr. Mitchell Welch

Reading

Chapters 2 and 3 from An Introduction to Parallel Programming by Peter Pacheco
MPI specification: http://www.mpi-forum.org
Open MPI reference: http://www.open-mpi.org

Summary

MPI IPC Functions
Dealing With I/O
Advanced MPI
Integral Estimation - The Trapezoidal rule

MPI IPC Functions

Today we will start where left off last lecture, continuing our look through the MPI elementary functions.
We will start now look at communication between our processes.
Without IPC, we can't do anything useful with our nodes!
Recall that communication in MPI, is based around the idea of message passing.
By message, we mean packets of data.

MPI IPC Functions

To send messages in MPI one needs to:
- Construct the message: usually an array of data or a struct.
- Specify:
  - the recipient.
  - the type of data being sent.
  - the amount of data being sent.
  - an identification tag.
A recipient is specified by two things:
- The rank of the recipient in a Communicator.
- The Communicator.
In elementary MPI we'll stick with one Communicator: MPI_COMM_WORLD

MPI IPC Functions

#include "mpi.h"
int MPI_Send(void *buf, 
    int count,
    MPI_Datatype datatype,
    int dest,
    int tag,
    MPI_Comm comm )

Performs a basic message send.This routine may block until the message is received. * buf is the address of send buffer. * count is the number of elements in send buffer. * datatype is the datatype of each send buffer element * dest is the rank of destination * tag is the message tag. * comm is the communicator.

MPI IPC Functions

MPI_Send may block until the message is received. Whether or not MPI_Send blocks depends on factors such as:
- How large the message is
- How many messages are pending to the specific destination
- Whether LAMD or C2C communication is being used.
- Read man mpirun for more details on this.

MPI IPC Functions

#include "mpi.h"
int MPI_Recv(void *buf, 
              int count, 
     MPI_Datatype datatype, 
              int source, 
              int tag, 
         MPI_Comm comm, 
       MPI_Status *status )

Performs a basic message receive
The count argument indicates the maximum length of a message.
The actual number can be determined with MPI_Get_count.

MPI IPC Functions

buf is the address of receive buffer.
status is where the status object will be placed.
datatype is the datatype of each buffer element.
source is the rank of the source.
tag is the message tag.
comm is the communicator.

MPI IPC Functions

Recall that in sending and receiving we need to specify the data types being sent.
For this purpose we have the following built in datatypes.

MPI Type	Description
MPI_CHAR	corresponds to C's signed char
MPI_SHORT	corresponds to C's signed short
MPI_INT	corresponds to C's signed int
MPI_LONG	corresponds to C's signed long int
MPI_FLOAT	corresponds to C's float
MPI_DOUBLE	corresponds to C's double

There are also unsigned versions.
There are also two odd ones called MPI_BYTE and MPI_PACKED.

MPI IPC Functions

#include "mpi.h"
int MPI_Get_count(MPI_Status *status, 
                  MPI_Datatype datatype, 
                  int *count)

Gets the number of elements
The number of received elements is placed in count.
If the size of the datatype is zero, this routine will return a count of zero.
If the amount of data in status is not an exact multiple of the size of datatype (so that count would not be an integer), a count of MPI_UNDEFINED is returned instead.

MPI IPC Functions

The Token Ring Revisited


#include <stdio.h>
#include "mpi.h"
#define TOKTAG 5
#define VERBOSE 1
int main(int argc, char** argv) {
  int me, nproc, src, dst, token;
  MPI_Status status;  
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &me);
  MPI_Comm_size(MPI_COMM_WORLD, &nproc);
  src = (me == nproc - 1) ? 0 : me + 1;
  dst = (me == 0) ? nproc - 1 : me - 1;

  ....

MPI IPC Functions

  ....

  if (me == 0) {
    token = 0;
    fprintf(stderr, "Token = %i  sent\n", token);
    MPI_Send(&token, 1, MPI_INT, dst, TOKTAG, MPI_COMM_WORLD);
    MPI_Recv(&token, 1, MPI_INT, src, TOKTAG, MPI_COMM_WORLD, &status);
    fprintf(stderr, "Token = %i  arrived back\n", token);
  } else {
    MPI_Recv(&token, 1, MPI_INT, src, TOKTAG, MPI_COMM_WORLD, &status);
    if(VERBOSE)fprintf(stderr, "Node %d passing token = %i\n", me, token);
    token++;
    MPI_Send(&token, 1, MPI_INT, dst, TOKTAG, MPI_COMM_WORLD);
  }
  MPI_Finalize();
  return 0;
}

Dealing with I/O

The MPI standard doesn't specify which processes have access to which I/O devices.
However, essentially all implementations of MPI allow all processes in MPI_COMM_WORLD to access stdout and stderr.
We have already seen this in "helloworld" examples so far.
Most MPI implementations don't provide any scheduling of the access to these devices.
The order of outputs are unpredictable - the processes are competing for access to the I/O device.

Dealing with I/O

Most implementations only allow process 0 to access stdin.
This makes sense as it is difficult to allocate lines of input to individual processes.
Input from stdin to process 0 can be proliferated to the other processes via the IPC functions.

Dealing with I/O

A nice I/O example:

void get_input( int my_rank, int comm_sz, double∗ a_p, double∗ b_p, int∗ np){
    int dest;
    if(myrank==0){
        printf("Enter a,b and n\n");
        scanf("%lf %lf %d", a_p, b_p, n_p);
        for (dest = 1; dest < comm sz; dest++) {
            MPI Send(a_p, 1, MPI DOUBLE, dest, 0, MPI COMM WORLD); 
            MPI Send(b_p, 1, MPI DOUBLE, dest, 0, MPI COMM WORLD); 
            MPI Send(n_p, 1, MPI INT, dest, 0, MPI COMM WORLD);
        }
    } else { /*my rank != 0*/
        MPI Recv(a_p, 1, MPI DOUBLE, 0, 0, MPI COMM WORLD, MPI STATUS IGNORE);
        MPI Recv(b_p, 1, MPI DOUBLE, 0, 0, MPI COMM WORLD, MPI STATUS IGNORE);
        MPI Recv(n_p, 1, MPI INT, 0, 0, MPI COMM WORLD, MPI STATUS IGNORE);
    }
}

Advanced MPI

We have finished Elementary MPI!
We'll continue on our tour of MPI looking at some more advanced functions.
The Integration example from the MPI user's guide.
Multiplying Matrices on a Torus in MPI.

Advanced MPI

Our coverage of advanced MPI will touch upon:
- Simple synchronisation.
- Complex patterns of communication.
- User defined datatypes.
- Non-blocking communication.
- Communicators and topologies.

Advanced MPI

There is no guarantee about message orderings.
There are wildcards for receiving.
There are no wildcards for sending.
MPI_ANY_SOURCE is the wild card for receiving.
Messages can be distinguished by tags and communicators - more on communicators later.
There is a wild card for tags: MPI_ANY_TAG
There are no wild cards for communicators.

Advanced MPI

#include "mpi.h"
int MPI_Barrier(MPI_Comm comm)

A barrier is a simple synchronisation mechanism.
Blocks until all processes in the communicator have reached this routine.
In other words, it blocks the caller until all group members have called it.
The call returns at any process only after all group members have entered the call.
Easy to use with MPI_COMM_WORLD
If the underlying device cannot do better, a tree-like or combine algorithm is used to broadcast a message to all members of the communicator.

Integral Estimation - The Trapezoidal rule

Suppose that we want to calculate:

intergral

for some fixed f

In these lectures we will fix \(f(x) = x^2\) , just to keep things simple.
Remember the integral computes the area under the curve f in the region from a to b

Integral Estimation - The Trapezoidal rule

One way to do this is to use the infamous Trapezoidal Rule!
Recall from those long math nights that a trapezoid is a quadrilateral having only two sides parallel.

Integral Estimation - The Trapezoidal rule

We partition the interval \([a,b]\) into \(n\) equal subintervals of width \(h\) , where: \(h = ( b - a ) / n\)

intergral

Integral Estimation - The Trapezoidal rule

We then compute the area \( A\) of each trapezoid:

integral

Thus:

integral

Then sum these:

integral

Integral Estimation - The Trapezoidal rule

Giving the Trapezoidal Rule:

integral

Where for \(0 < i < n\)

float f(float x){
    return x*x;
}
float Trap(float a, float b, int n, float h){ 
    float integral, x;
    int i; 
    integral = (f(a) + f(b))/2.0; 
    x = a; 
    for (i = 1; i <= n-1; i++) { 
        x += h; 
        integral += f(x); } 
    return integral * h;
}

Integral Estimation - The Trapezoidal rule

Prior to parallelising we do a simple sequential analogue.
We pick an interval and the number of trapezoids n.
From this we compute our h.
We divide the interval into nproc subtasks/subintervals.
On each subtask we use Trap to compute the integral on the subintervals.
The sum is the result

Integral Estimation - The Trapezoidal rule

#include <stdio.h>
#include "trap.h"
int main(int argc, char** argv) {
  float h, a = 0.0, b = 1.0, integral = 0, local_a, local_b;    
  int i, n = 1000, nproc = 10, local_n = n/nproc;
  h = (b-a)/n;
  for(i = 0; i < nproc; i++){
    local_a = a + i*local_n*h;
    local_b = a + (i + 1)*local_n*h;
    integral += Trap(local_a, local_b, local_n, h);
  }
  printf("With n = %d trapezoids\n", n);
  printf("The integral from %f to %f is approx %f\n", 
         a, b, integral); 
  return 0;
}

Integral Estimation - The Trapezoidal rule

We'll look at five versions in the next few lectures. These are:
- Version 0: The simplest version (everything hardwired).
- Version 1: Obtains the endpoints and increment from the user.
- Version 2: Uses MPI's collective communication primitives.
- Version 3: Uses MPI's User defined type mechanism.
- Version 4: Uses MPI's pack and unpack primitives.
- Version 5: Uses the C sizeof() macro and user defined types.

Integral Estimation - The Trapezoidal rule

From Peter Pacheco's An Introduction to Parallel Programming.
Parallel Trapezoid Rule: version 0.
Estimate of the integral from a to b of f(x) using the trapezoid rule and n trapezoids. The algorithm:
1. Each process calculates its interval of integration.
2. Each process estimates the integral of f(x) over its interval using the trapezoid rule.
3. Each process whose rank is not 0 sends its integral to 0.
4. Process 0 sums the calculations received from the individual processes and prints the result.
Note: in this version f(x) , a , b , and n are all hardwired.

Integral Estimation - The Trapezoidal rule

Review parallel_trap0.c

Integral Estimation - The Trapezoidal rule

The first thing we can do to improve this program is eliminate the hardwired endpoints and the number of trapezoids.
Could use argument arrays, but remember they are not standard in MPI.
So the member with rank zero asks the user and then sends them to the others.
We'll send the info one by one, disambiguated by tags.

Integral Estimation - The Trapezoidal rule

Review parallel_trap1.c

Integral Estimation - The Trapezoidal rule

A more efficient way of communicating this sort of information is obtained using the MPI broadcasting facility.
MPI encourages efficient implementation (via trees) of broadcasting.
In a similar fashion, accumulating the results is done via the MPI reduce facility.

Integral Estimation - The Trapezoidal rule

#include "mpi.h"
int MPI_Bcast(void*         buffer, 
              int           count,  
              MPI_Datatype  datatyp, 
              int           root, 
              MPI_Comm      comm)

N.B. All tasks in the communicator must call this routine!
Broadcasts a message from the process with rank root to all other processes of the communicator group.
This call acts like a send for the process with rank root.
For all other processes in the group it acts like a receive!

Integral Estimation - The Trapezoidal rule

buffer is the starting address of the buffer.
count is the number of things in the buffer.
datayp is the data type of the stuff in the buffer.
root is the rank of the process broadcasting.

Integral Estimation - The Trapezoidal rule

#include "mpi.h"
int MPI_Reduce (void*        sndbuf, 
                void*        rcvbuf, 
                int          count, 
                MPI_Datatype datatyp, 
                MPI_Op       op, 
                int          root, 
                MPI_Comm     comm)

N.B. All tasks in the communicator must call this routine!
This procedure reduces values on all processes to a single value
This is a more efficient way of collecting results, since it uses tree shaped communication.

Integral Estimation - The Trapezoidal rule

sndbuf is the send buffer
rcvbuf address of receive buffer
- significant only at root
datatyp is data type of elements of send buffer
op is the reduce operation.
root is the rank of the root process
comm is the communicator

Integral Estimation - The Trapezoidal rule

It is possible to create you own.
Built in's include: MPI_MAX MPI_MIN MPI_SUM MPI_PROD MPI_LAND MPI_LOR
For intergration MPI_SUM will be useful.

Integral Estimation - The Trapezoidal rule

Review parallel_trap2.c

Summary

MPI IPC Functions
Dealing With I/O
Advanced MPI
Integral Estimation - The Trapezoidal rule

Reading

Chapters 2 and 3 from An Introduction to Parallel Programming by Peter Pacheco
MPI specification: http://www.mpi-forum.org
Open MPI reference: http://www.open-mpi.org