You are required to understand the function of each call, the input parameters (types and what they are) and the return values.
Focus on the main calls (e.g. fork(), exec()-family, exit(), atext(), open(), read(), dup2() etc.)
Know your process states and the situations that result in zombies and orphans
Exam Hints
Multi-Process Programming
Review the ring of processes - you should be familiar with this from the assignment
Understand the key differences/advantages/disadvantages of multithreaded programs vs multiprocess programs
Exam Hints
Multithreaded Programming
Review the producers-consumers and dining philosophers.
Understand how mutexes and semaphores can be applied to provide mutual exclusion and synchronisation in these examples.
You will be required to construct a C program that uses semaphores for synchronisation and mutual exclusion.
Exam Hints
Multithreaded Programming
Review all system calls (e.g. pthread_create, pthread_join) as you will need to
There is a challenging question in the exam that requires you to understand dining philosophers and the producers/consumers problem.
I will be allocating marks for correct use of the function calls and the overall algorithm - so get something down for this question as there are lots of easy marks available even though it is a difficult question.
Basically you will need to be able to construct a multithreaded program using the correct system calls.
Exam Hints
PVM Programming
Matrix multiplication!
We looked at a few examples and you produced a program with a simple algorithm in your assignments.
Understand how a basic algorithm (e.g. sending everything out from the root node to the non-root nodes for processing) works
Review the torus examples - I might ask for either approach.
Understand how to implement a ring structure in PVM (i.e. independent of the matrix multiplication)
Exam Hints
PVM Programming
Know your PVM system calls e.g. input, functionality and return values!
e.g. pvm_spawn, pvm_barrier etc.
You will need to understand how to create a set of master-slave processes to cooperate to solve a problem.
Exam Hints
MPI
Constructing a ring for processing in MPI.
As a result you will need to understand the basic functions
The trapezoid rule and the simple implementations. You may have to describe a parallel implementation of this algorithm.
Exam Hints
General Hints
Put something down for each question! I can give you marks for a black page.
Read the questions carefully - not all questions ask for a coded solution, some only ask for a description.
Some questions may have multiple parts - answer each part for full marks.
Exam Hints
Focus on the verbs in the questions.
The verbs are the 'doing' words.
Make sure you do what they refer to.
Common verbs include:
Identify - Recognise and name
Describe/Explain - Outline the key points of the subject
Compare - Explain key similarities/differences
Discuss - Points for and against
Assess - Discuss the subject and come to a conclusion
Exam Hints
Review the past exams!
Start with the most recent and work back - this subject has been on offer for many iterations so there are many past exams
There will be similar questions and a similar structure to the previous exams.
We'll finish off this unit implementing Fox's Matrix Multiplication in MPI.
Matrix Multiplication - Fox's Algorithm
We are going to have m x m tasks
Each task will manage the appropriate block entry in the matrices.
I.e. each task corresponds to an entry in the matrix.
The entries in the matrix are matrices of size blk.
Thus in reality we will be multiplying m x blk square matrices.
Matrix Multiplication - Fox's Algorithm
Now this is how the algorithm works:
Each task starts out with an A , B , and C block.
In step one, the tasks on the diagonal (i.e. row i = column i ) broadcast their $ A$ block along their row.
They each then multiply this A block with their B block and add it to their C block.
Now we rotate the B blocks.
We pass our B block to the task above us (if we are in the top row we pass our B to the entry in the bottom row) but if you think of the thing as a torus we are just passing it up.
Matrix Multiplication - Fox's Algorithm
The process now iterates:
We go to the next diagonal along.
I.e tasks in the diagonal (i.e. row i , column i + 1 ) broadcast their A 's, do the multiplication, add it to C , then pass the B 's up. our result C = AB .
After M iterations everything returns to it original place and we have
Matrix Multiplication - Fox's Algorithm
To implement the algorithm in MPI we need to:
Be able to broadcast our $ A$ block along our row of sibling processes.
Have a simple way of referring to the source and destination in our $ B$ column of sibling processes.
To do these both nicely and efficiently in MPI requires futzing with:
communicators.
topologies.
These will be the topic for today.
Matrix Multiplication - Fox's Algorithm
A communicator in MPI is:
A collection of processes that can communicate with one another, and
Participate in collective communication operations.
A communicator consists of:
A group: an ordered set of processes.
A context: a handle that uniquely identifies the communicator.
Matrix Multiplication - Fox's Algorithm
There are various methods for constructing communicators.
The simplest three are:
Subdividing an existing communicator using:
MPI_Comm_split.
Constructing a communicator from a subset of an existing communicator using
MPI_Comm_group, MPI_Comm_incl, and
MPI_Comm_create.
Using topologies, this is done in two separate steps:
By imposing a grid like topology on an already existing communicator using MPI_Cart_create
Then using this grid like topology to form column and row communicators using MPI_Cart_sub
We will only look at this last method.
Matrix Multiplication - Fox's Algorithm
A topology on a communicator is a scheme for addressing processes in that communicator.
MPI offers two different scheme:
A Cartesian Grid based scheme.
A Graph-like scheme.
We are only interested in the former.
To create a grid-like topology on a communicator one needs to specify:
The number of dimensions in the grid.
The size of each dimension.
Wraparound: whether the endpoints in a particular dimension should be considered to be neighbors.
Whether you want MPI to optimize the grid to the underlying cluster nodes.
Matrix Multiplication - Fox's Algorithm
In the torus for the Fox algorithm we have:
A two dimensional grid, actually a square grid.
The dimensions are the same, the dimensions of our matrix of blocks m .
For a torus we have wrap around in both dimensions
Sure why not?
Matrix Multiplication - Fox's Algorithm
To create our grid communicator we execute:
MPI_Comm grid; //The resulting Communicator
int dim = 2; //The number of Dimensions
int dims[dim] = { m , m }; //The size of each Dimension
int wrap[dim] = { 1 , 1}; //Wraparound?
int optimize = 1; //Optimize?
MPI_Cart_create(MPI_COMM_WORLD, dim, dims, wrap, optimize, &grid);
After executing this grid will be our desired grid topology.
By optimizing the topology, a processes rank in grid will usually be different from its rank in MPI_COMM_WORLD.
In other words optimizing may reorder.
MPI_Cart_create is a collective operation!
Every process must perform it.
Matrix Multiplication - Fox's Algorithm
To find out ones rank in the new communicator:
int grid_rank;
MPI_Comm_rank(grid, &grid_rank);
To find out ones coordinates in the grid:
// dim == 2
int coords[dim];
MPI_Cart_coords(grid, grid_rank, dim, coords);
Note that we need our grid rank to find our coordinates.
Matrix Multiplication - Fox's Algorithm
To make a communicator corresponding to our row:
// dim == 2
MPI_Comm row;
int direction[dim] = { 0, 1 };
MPI_Cart_sub(grid, direction, &row);
To make a communicator corresponding to our column:
// dim == 2
MPI_Comm col;
int direction[dim] = { 1, 0 };
MPI_Cart_sub(grid, direction, &col);
play with test_mpi_run - this compares the pvm and mpi implementations.
View the notes version for this slide:
bourbaki Tue May 26 Examples $ test_mpi_run
Enter the number of processes (a perfect square): 64
Enter the size of the blocks: 500
You are going to multiply 4000 by 4000 matrices!
Do you want me to make the matrices? (y/n)
y
Finished writing R
-rwx------ 1 comp309 comp309 62M May 26 21:38 R
Finished writing I
-rwx------ 1 comp309 comp309 62M May 26 21:38 I
Finished writing IR
-rwx------ 1 comp309 comp309 62M May 26 21:38 IR
Finished writing RI
-rwx------ 1 comp309 comp309 62M May 26 21:38 RI
Creating Grid Communicator
Creating Column Communicator
Creating Row Communicator
Allocating Matrices
Initializing Matrices
Finished Allocating and Initializing Matrices
Looping: iteration 1 out of a total of 8
Looping: iteration 2 out of a total of 8
Looping: iteration 3 out of a total of 8
Looping: iteration 4 out of a total of 8
Looping: iteration 5 out of a total of 8
Looping: iteration 6 out of a total of 8
Looping: iteration 7 out of a total of 8
Looping: iteration 8 out of a total of 8
Finished computation
Writing C
Cleaning up
Exiting gracefully after 107 seconds
Creating Grid Communicator
Creating Column Communicator
Creating Row Communicator
Allocating Matrices
Initializing Matrices
Finished Allocating and Initializing Matrices
Looping: iteration 1 out of a total of 8
Looping: iteration 2 out of a total of 8
Looping: iteration 3 out of a total of 8
Looping: iteration 4 out of a total of 8
Looping: iteration 5 out of a total of 8
Looping: iteration 6 out of a total of 8
Looping: iteration 7 out of a total of 8
Looping: iteration 8 out of a total of 8
Finished computation
Writing C
Cleaning up
Exiting gracefully after 100 seconds
IR and RI agree
bourbaki Tue May 26 Examples $