Today we will start where left off last lecture, continuing our look through the MPI elementary functions.
We will start now look at communication between our processes.
Without IPC, we can't do anything useful with our nodes!
Recall that communication in MPI, is based around the idea of message passing.
By message, we mean packets of data.
MPI IPC Functions
To send messages in MPI one needs to:
Construct the message: usually an array of data or a struct.
Specify:
the recipient.
the type of data being sent.
the amount of data being sent.
an identification tag.
A recipient is specified by two things:
The rank of the recipient in a Communicator.
The Communicator.
In elementary MPI we'll stick with one Communicator: MPI_COMM_WORLD
MPI IPC Functions
#include "mpi.h"
int MPI_Send(void *buf,
int count,
MPI_Datatype datatype,
int dest,
int tag,
MPI_Comm comm )
Performs a basic message send.This routine may block until the message is received.
* buf is the address of send buffer.
* count is the number of elements in send buffer.
* datatype is the datatype of each send buffer element
* dest is the rank of destination
* tag is the message tag.
* comm is the communicator.
MPI IPC Functions
MPI_Send may block until the message is received.
Whether or not MPI_Send blocks depends on factors such as:
How large the message is
How many messages are pending to the specific destination
Whether LAMD or C2C communication is being used.
Read man mpirun for more details on this.
MPI IPC Functions
#include "mpi.h"
int MPI_Recv(void *buf,
int count,
MPI_Datatype datatype,
int source,
int tag,
MPI_Comm comm,
MPI_Status *status )
Performs a basic message receive
The count argument indicates the maximum length of a message.
The actual number can be determined with MPI_Get_count.
MPI IPC Functions
buf is the address of receive buffer.
status is where the status object will be placed.
datatype is the datatype of each buffer element.
source is the rank of the source.
tag is the message tag.
comm is the communicator.
MPI IPC Functions
Recall that in sending and receiving we need to specify the data types being sent.
For this purpose we have the following built in datatypes.
MPI Type
Description
MPI_CHAR
corresponds to C's signed char
MPI_SHORT
corresponds to C's signed short
MPI_INT
corresponds to C's signed int
MPI_LONG
corresponds to C's signed long int
MPI_FLOAT
corresponds to C's float
MPI_DOUBLE
corresponds to C's double
There are also unsigned versions.
There are also two odd ones called MPI_BYTE and MPI_PACKED.
MPI IPC Functions
#include "mpi.h"
int MPI_Get_count(MPI_Status *status,
MPI_Datatype datatype,
int *count)
Gets the number of elements
The number of received elements is placed in count.
If the size of the datatype is zero, this routine will return a count of zero.
If the amount of data in status is not an exact multiple of the size of datatype (so that count would not be an integer), a count of MPI_UNDEFINED is returned instead.
The MPI standard doesn't specify which processes have access to which I/O devices.
However, essentially all implementations of MPI allow all processes in MPI_COMM_WORLD to access stdout and stderr.
We have already seen this in "helloworld" examples so far.
Most MPI implementations don't provide any scheduling of the access to these devices.
The order of outputs are unpredictable - the processes are competing for access to the I/O device.
Dealing with I/O
Most implementations only allow process 0 to access stdin.
This makes sense as it is difficult to allocate lines of input to individual processes.
Input from stdin to process 0 can be proliferated to the other processes via the IPC functions.
Dealing with I/O
A nice I/O example:
void get_input( int my_rank, int comm_sz, double∗ a_p, double∗ b_p, int∗ np){
int dest;
if(myrank==0){
printf("Enter a,b and n\n");
scanf("%lf %lf %d", a_p, b_p, n_p);
for (dest = 1; dest < comm sz; dest++) {
MPI Send(a_p, 1, MPI DOUBLE, dest, 0, MPI COMM WORLD);
MPI Send(b_p, 1, MPI DOUBLE, dest, 0, MPI COMM WORLD);
MPI Send(n_p, 1, MPI INT, dest, 0, MPI COMM WORLD);
}
} else { /*my rank != 0*/
MPI Recv(a_p, 1, MPI DOUBLE, 0, 0, MPI COMM WORLD, MPI STATUS IGNORE);
MPI Recv(b_p, 1, MPI DOUBLE, 0, 0, MPI COMM WORLD, MPI STATUS IGNORE);
MPI Recv(n_p, 1, MPI INT, 0, 0, MPI COMM WORLD, MPI STATUS IGNORE);
}
}
Advanced MPI
We have finished Elementary MPI!
We'll continue on our tour of MPI looking at some more advanced functions.
The Integration example from the MPI user's guide.
Multiplying Matrices on a Torus in MPI.
Advanced MPI
Our coverage of advanced MPI will touch upon:
Simple synchronisation.
Complex patterns of communication.
User defined datatypes.
Non-blocking communication.
Communicators and topologies.
Advanced MPI
There is no guarantee about message orderings.
There are wildcards for receiving.
There are no wildcards for sending.
MPI_ANY_SOURCE is the wild card for receiving.
Messages can be distinguished by tags and communicators - more on communicators later.
There is a wild card for tags: MPI_ANY_TAG
There are no wild cards for communicators.
Advanced MPI
#include "mpi.h"
int MPI_Barrier(MPI_Comm comm)
A barrier is a simple synchronisation mechanism.
Blocks until all processes in the communicator have reached this routine.
In other words, it blocks the caller until all group members have called it.
The call returns at any process only after all group members have entered the call.
Easy to use with MPI_COMM_WORLD
If the underlying device cannot do better, a tree-like or combine algorithm is used to broadcast a message to all members of the communicator.
Integral Estimation - The Trapezoidal rule
Suppose that we want to calculate:
for some fixed f
In these lectures we will fix \(f(x) = x^2\) , just to keep things simple.
Remember the integral computes the area under the curve f in the region from a to b
Integral Estimation - The Trapezoidal rule
One way to do this is to use the infamous Trapezoidal Rule!
Recall from those long math nights that a trapezoid is a quadrilateral having only two sides parallel.
Integral Estimation - The Trapezoidal rule
We partition the interval \([a,b]\) into \(n\) equal subintervals of width \(h\) , where: \(h = ( b - a ) / n\)
Integral Estimation - The Trapezoidal rule
Integral Estimation - The Trapezoidal rule
We then compute the area \( A\) of each trapezoid:
Thus:
Then sum these:
Integral Estimation - The Trapezoidal rule
Giving the Trapezoidal Rule:
Where for \(0 < i < n\)
float f(float x){
return x*x;
}
float Trap(float a, float b, int n, float h){
float integral, x;
int i;
integral = (f(a) + f(b))/2.0;
x = a;
for (i = 1; i <= n-1; i++) {
x += h;
integral += f(x); }
return integral * h;
}
Integral Estimation - The Trapezoidal rule
Prior to parallelising we do a simple sequential analogue.
We pick an interval and the number of trapezoids n.
From this we compute our h.
We divide the interval into nproc subtasks/subintervals.
On each subtask we use Trap to compute the integral on the subintervals.
The sum is the result
Integral Estimation - The Trapezoidal rule
#include <stdio.h>
#include "trap.h"
int main(int argc, char** argv) {
float h, a = 0.0, b = 1.0, integral = 0, local_a, local_b;
int i, n = 1000, nproc = 10, local_n = n/nproc;
h = (b-a)/n;
for(i = 0; i < nproc; i++){
local_a = a + i*local_n*h;
local_b = a + (i + 1)*local_n*h;
integral += Trap(local_a, local_b, local_n, h);
}
printf("With n = %d trapezoids\n", n);
printf("The integral from %f to %f is approx %f\n",
a, b, integral);
return 0;
}
Integral Estimation - The Trapezoidal rule
We'll look at five versions in the next few lectures.
These are:
Version 0: The simplest version (everything hardwired).
Version 1: Obtains the endpoints and increment from the user.
Version 2: Uses MPI's collective communication primitives.
Version 3: Uses MPI's User defined type mechanism.
Version 4: Uses MPI's pack and unpack primitives.
Version 5: Uses the C sizeof() macro and user defined types.
Integral Estimation - The Trapezoidal rule
From Peter Pacheco's An Introduction to Parallel Programming.
Parallel Trapezoid Rule: version 0.
Estimate of the integral from a to b of f(x) using the trapezoid rule and n trapezoids. The algorithm:
Each process calculates its interval of integration.
Each process estimates the integral of f(x) over its interval using the trapezoid rule.
Each process whose rank is not 0 sends its integral to 0.
Process 0 sums the calculations received from the individual processes and prints the result.
Note: in this version f(x) , a , b , and n are all hardwired.