Title: COMP309/509 - Lecture 24
class: middle, center, inverse
COMP309/509 - Parallel and Distributed Computing
Lecture 24 - Exam Review and Final MPI Examples
By Mitchell Welch
University of New England
Reading
- Past Exam Papers
- Assignment 4
- Project (COMP509)
Summary
- Exam Hints
- Matrix Multiplication - Fox’s Algorithm
Exam Hints
- This unit was broken down into 4 modules
- Multi-Process Programming
- Multithreaded Programming
- PVM Programming
- MPI Programming
Exam Hints
- Multi-Process Programming
- Review your system calls!
- You are required to understand the function of each call, the input parameters (types and what they are) and the return values.
- Focus on the main calls (e.g.
fork()
, exec()
-family, exit()
, atext()
, open()
, read()
, dup2()
etc.)
- Know your process states and the situations that result in zombies and orphans
Exam Hints
Exam Hints
- Multithreaded Programming
- Review the producers-consumers and dining philosophers.
- Understand how mutexes and semaphores can be applied to provide mutual exclusion and synchronisation in these examples.
- You will be required to construct a C program that uses semaphores for synchronisation and mutual exclusion.
Exam Hints
- Multithreaded Programming
- Review all system calls (e.g.
pthread_create
, pthread_join
) as you will need to
- There is a challenging question in the exam that requires you to understand dining philosophers and the producers/consumers problem.
- I will be allocating marks for correct use of the function calls and the overall algorithm - so get something down for this question as there are lots of easy marks available even though it is a difficult question.
- Basically you will need to be able to construct a multithreaded program using the correct system calls.
Exam Hints
- PVM Programming
- Matrix multiplication!
- We looked at a few examples and you produced a program with a simple algorithm in your assignments.
- Understand how a basic algorithm (e.g. sending everything out from the root node to the non-root nodes for processing) works
- Review the torus examples - I might ask for either approach.
- Understand how to implement a ring structure in PVM (i.e. independent of the matrix multiplication)
Exam Hints
- PVM Programming
- Know your PVM system calls e.g. input, functionality and return values!
- e.g.
pvm_spawn
, pvm_barrier
etc.
- You will need to understand how to create a set of master-slave processes to cooperate to solve a problem.
Exam Hints
- MPI
- Constructing a ring for processing in MPI.
- As a result you will need to understand the basic functions
- The trapezoid rule and the simple implementations. You may have to describe a parallel implementation of this algorithm.
Exam Hints
- General Hints
- Put something down for each question! I can give you marks for a black page.
- Read the questions carefully - not all questions ask for a coded solution, some only ask for a description.
- Some questions may have multiple parts - answer each part for full marks.
Exam Hints
- Focus on the verbs in the questions.
- The verbs are the ‘doing’ words.
- Make sure you do what they refer to.
- Common verbs include:
- Identify - Recognise and name
- Describe/Explain - Outline the key points of the subject
- Compare - Explain key similarities/differences
- Discuss - Points for and against
- Assess - Discuss the subject and come to a conclusion
Exam Hints
- Review the past exams!
- Start with the most recent and work back - this subject has been on offer for many iterations so there are many past exams
- There will be similar questions and a similar structure to the previous exams.
Matrix Multiplication - Fox’s Algorithm
http://www.netlib.org/pvm3/book/pvm-book.html
- A similar but not identical algorithm also makes an appearance with pictures on pp 36-37 of the PVM book.
http://www.netlib.org/pvm3/book/pvm-book.html
- We’ll finish off this unit implementing Fox’s Matrix Multiplication in MPI.
Matrix Multiplication - Fox’s Algorithm
- We are going to have m x m tasks
- Each task will manage the appropriate block entry in the matrices.
- I.e. each task corresponds to an entry in the matrix.
- The entries in the matrix are matrices of size
blk
.
- Thus in reality we will be multiplying m x
blk
square matrices.
Matrix Multiplication - Fox’s Algorithm
- Now this is how the algorithm works:
- Each task starts out with an A , B , and C block.
- In step one, the tasks on the diagonal (i.e. row i = column i ) broadcast their $ A$ block along their row.
- They each then multiply this A block with their B block and add it to their C block.
- Now we rotate the B blocks.
- We pass our B block to the task above us (if we are in the top row we pass our B to the entry in the bottom row) but if you think of the thing as a torus we are just passing it up.
Matrix Multiplication - Fox’s Algorithm
- The process now iterates:
- We go to the next diagonal along.
- I.e tasks in the diagonal (i.e. row i , column i + 1 ) broadcast their A ’s, do the multiplication, add it to C , then pass the B ’s up. our result C = AB .
- After M iterations everything returns to it original place and we have
Matrix Multiplication - Fox’s Algorithm
- To implement the algorithm in MPI we need to:
Be able to broadcast our $ A$ block along our row of sibling processes.
- Have a simple way of referring to the source and destination in our $ B$ column of sibling processes.
- To do these both nicely and efficiently in MPI requires futzing with:
- communicators.
- topologies.
- These will be the topic for today.
Matrix Multiplication - Fox’s Algorithm
- A communicator in MPI is:
- A collection of processes that can communicate with one another, and
- Participate in collective communication operations.
- A communicator consists of:
- A group: an ordered set of processes.
- A context: a handle that uniquely identifies the communicator.
Matrix Multiplication - Fox’s Algorithm
- There are various methods for constructing communicators.
- The simplest three are:
- Subdividing an existing communicator using:
- Constructing a communicator from a subset of an existing communicator using
MPI_Comm_group, MPI_Comm_incl, and
MPI_Comm_create
.
- Using topologies, this is done in two separate steps:
- By imposing a grid like topology on an already existing communicator using
MPI_Cart_create
- Then using this grid like topology to form column and row communicators using
MPI_Cart_sub
- We will only look at this last method.
Matrix Multiplication - Fox’s Algorithm
- A topology on a communicator is a scheme for addressing processes in that communicator.
- MPI offers two different scheme:
- A Cartesian Grid based scheme.
- A Graph-like scheme.
- We are only interested in the former.
- To create a grid-like topology on a communicator one needs to specify:
- The number of dimensions in the grid.
- The size of each dimension.
- Wraparound: whether the endpoints in a particular dimension should be considered to be neighbors.
- Whether you want MPI to optimize the grid to the underlying cluster nodes.
Matrix Multiplication - Fox’s Algorithm
- In the torus for the Fox algorithm we have:
- A two dimensional grid, actually a square grid.
- The dimensions are the same, the dimensions of our matrix of blocks
m
.
- For a torus we have wrap around in both dimensions
- Sure why not?
Matrix Multiplication - Fox’s Algorithm
To create our grid communicator we execute:
MPI_Comm grid; //The resulting Communicator
int dim = 2; //The number of Dimensions
int dims[dim] = { m , m }; //The size of each Dimension
int wrap[dim] = { 1 , 1}; //Wraparound?
int optimize = 1; //Optimize?
MPI_Cart_create(MPI_COMM_WORLD, dim, dims, wrap, optimize, &grid);
- After executing this grid will be our desired grid topology.
- By optimizing the topology, a processes rank in grid will usually be different from its rank in
MPI_COMM_WORLD
.
- In other words optimizing may reorder.
MPI_Cart_create
is a collective operation!
- Every process must perform it.
Matrix Multiplication - Fox’s Algorithm
- To find out ones rank in the new communicator:
int grid_rank;
MPI_Comm_rank(grid, &grid_rank);
- To find out ones coordinates in the grid:
// dim == 2
int coords[dim];
MPI_Cart_coords(grid, grid_rank, dim, coords);
- Note that we need our grid rank to find our coordinates.
Matrix Multiplication - Fox’s Algorithm
- To make a communicator corresponding to our row:
// dim == 2
MPI_Comm row;
int direction[dim] = { 0, 1 };
MPI_Cart_sub(grid, direction, &row);
- To make a communicator corresponding to our column:
// dim == 2
MPI_Comm col;
int direction[dim] = { 1, 0 };
MPI_Cart_sub(grid, direction, &col);
- MPI_Cart_sub is a collective operation!
- Every process must perform it.
Matrix Multiplication - Fox’s Algorithm
- Review torus.c in the examples directory
- play with test_mpi_run - this compares the pvm and mpi implementations.
- View the notes version for this slide:
bourbaki Tue May 26 Examples $ test_mpi_run
Enter the number of processes (a perfect square): 64
Enter the size of the blocks: 500
You are going to multiply 4000 by 4000 matrices!
Do you want me to make the matrices? (y/n)
y
Finished writing R
-rwx------ 1 comp309 comp309 62M May 26 21:38 R
Finished writing I
-rwx------ 1 comp309 comp309 62M May 26 21:38 I
Finished writing IR
-rwx------ 1 comp309 comp309 62M May 26 21:38 IR
Finished writing RI
-rwx------ 1 comp309 comp309 62M May 26 21:38 RI
Creating Grid Communicator
Creating Column Communicator
Creating Row Communicator
Allocating Matrices
Initializing Matrices
Finished Allocating and Initializing Matrices
Looping: iteration 1 out of a total of 8
Looping: iteration 2 out of a total of 8
Looping: iteration 3 out of a total of 8
Looping: iteration 4 out of a total of 8
Looping: iteration 5 out of a total of 8
Looping: iteration 6 out of a total of 8
Looping: iteration 7 out of a total of 8
Looping: iteration 8 out of a total of 8
Finished computation
Writing C
Cleaning up
Exiting gracefully after 107 seconds
Creating Grid Communicator
Creating Column Communicator
Creating Row Communicator
Allocating Matrices
Initializing Matrices
Finished Allocating and Initializing Matrices
Looping: iteration 1 out of a total of 8
Looping: iteration 2 out of a total of 8
Looping: iteration 3 out of a total of 8
Looping: iteration 4 out of a total of 8
Looping: iteration 5 out of a total of 8
Looping: iteration 6 out of a total of 8
Looping: iteration 7 out of a total of 8
Looping: iteration 8 out of a total of 8
Finished computation
Writing C
Cleaning up
Exiting gracefully after 100 seconds
IR and RI agree
bourbaki Tue May 26 Examples $
Summary
- Exam Hints
- Matrix Multiplication - Fox’s Algorithm
class: middle, center, inverse
Questions?
Reading
- Past Exam Papers
- Assignment 4
- Project (COMP509)
class: middle, center, inverse
Good Luck in Your Exams!!