>

COSC330/530 Parallel and Distributed Computing

Lecture 13 - Distributed Memory Computing with MPI

Dr. Mitchell Welch


Reading


Summary


Flynn's Taxonomy


Flynn's Taxonomy


Alt text


Flynn's Taxonomy


Flynn's Taxonomy


Alt text


Flynn's Taxonomy


for (i = 0; i < n ; i++){ x[i] += y[i] }

Flynn's Taxonomy


Flynn's Taxonomy


// Kernel definition __global__ void VecAdd(float* A, float* B, float* C){ int i = threadIdx.x; C[i] = A[i] + B[i]; } int main() { ... // Kernel invocation with N threads VecAdd<<<1, N>>>(A, B, C); ... }

Flynn's Taxonomy


Alt text


Flynn's Taxonomy


Flynn's Taxonomy


Alt text


Flynn's Taxonomy


Flynn's Taxonomy


Performance

\( S = \frac{T_{serial}}{T_{parallel}}\)

Where, \(T_{serial}\) is the serial runtime of the program and \(T_{parallel}\) is the parallel runtime.


Performance

\( E = \frac{S}{p} = \frac{\big(\frac{T_{serial}}{T_{parallel}}\big)}{p} = \frac{T_{serial}}{p \cdot T_{parallel}}\)

Where, S is the speedup, p is the number of processes, \(T_{serial}\) is the serial runtime of the program and \(T_{parallel}\) is the parallel runtime


Performance


Performance

\( T_{parallel} = \frac{T_{serial}}{p + T_{overhead}}\)

Performance


Performance

#include <time.h>
#include <stdio.h>
#include <stdlib.h>

int main(){
    clock_t begin = clock();
    /* here, do your time-consuming job */
    int *testing;
    for(int i = 0 ;  i< 1000000000; i++){
        /*In this situation, we are pointlessly allocating/deallocating memory*/
        testing = (int*) malloc(512);
        free(testing);
    }
    clock_t end = clock();
    double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("Time spent processing:  %f\n", time_spent);
    exit(EXIT_SUCCESS);
    return 0;
}


Amdahl's Law


Amdahl's Law

\( T_{parallel} = (0.9 \times T_{serial} / p) + 0.1 \times T_{serial} \)
\( S = T_{serial} / T_{parallel} \)
\( S = 20 / (18/p +2)\)

Amdahl's Law

\( S \leq T_{serial} / ( 0.1 \times T_{serial}) = 20 / 2 = 10 \)

Amdahl's Law


Alt text


Amdahl's Law


Amdahl's Law


The Beowulf Cluster


The Beowulf Cluster


The Beowulf Cluster


The Beowulf Cluster


The Beowulf Cluster


The Beowulf Cluster

Last login: Tue Aug  9 13:02:17 2023 from turing.une.edu.au
[cosc330@bourbaki ~] $ ssh b1
[cosc330@b1 ~] $ exit
logout
Connection to b1 closed.
[cosc330@bourbaki ~] $ ssh b2
[cosc330@b2 ~] $ exit
logout
Connection to b2 closed.
[cosc330@bourbaki ~] 


Message Passing Interface


Message Passing Interface


#include <stdio.h> #include "mpi.h" int main(int argc, char** argv) { MPI_Init(&argc, &argv); printf("\tHello World!\n"); MPI_Finalize(); return 0; }

Message Passing Interface

[cosc330@bourbaki examples] $ mpicc hellompi.c -Wall -o hellompi
[cosc330@bourbaki examples] $ mpiexec -np 4 hellompi
    Hello World!
    Hello World!
    Hello World!
    Hello World!
[cosc330@bourbaki examples] $ mpiexec -np 2 hellompi
    Hello World!
    Hello World!
[cosc330@bourbaki examples] $ 


The MPI Paradigm


The MPI Paradigm


MPI Elementary Functions

#include "mpi.h"
int MPI_Init(int *argc, char ***argv)
MPI_Init(&argc, &argv);

MPI Elementary Functions


MPI Elementary Functions


MPI Elementary Functions

#include "mpi.h"
int MPI_Finalize()

MPI Elementary Functions


MPI Elementary Functions


#include <stdio.h> #include "mpi.h" int main(int argc, char** argv) { int me, nproc; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &me); MPI_Comm_size(MPI_COMM_WORLD, &nproc); printf("I'm clone %d out of %d total!\n", me, nproc); MPI_Finalize(); return 0; }

MPI Elementary Functions

[cosc330@bourbaki examples] $ mpiexec -np 20 whoAmI
I'm clone 8 out of 20 total!
I'm clone 10 out of 20 total!
I'm clone 11 out of 20 total!
I'm clone 12 out of 20 total!
I'm clone 13 out of 20 total!
I'm clone 17 out of 20 total!
I'm clone 0 out of 20 total!
I'm clone 1 out of 20 total!
I'm clone 2 out of 20 total!
I'm clone 3 out of 20 total!
I'm clone 6 out of 20 total!
I'm clone 7 out of 20 total!
I'm clone 16 out of 20 total!
I'm clone 19 out of 20 total!
I'm clone 4 out of 20 total!
I'm clone 5 out of 20 total!
I'm clone 15 out of 20 total!
I'm clone 18 out of 20 total!
I'm clone 9 out of 20 total!
I'm clone 14 out of 20 total!
[cosc330@bourbaki examples] $ 

MPI Elementary Functions


#include <stdio.h> #include <unistd.h> #include <limits.h> #include "mpi.h" int main(int argc, char** argv) { int me, nproc; char hostname[PATH_MAX]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &me); MPI_Comm_size(MPI_COMM_WORLD, &nproc); if(gethostname(hostname, PATH_MAX) == 0){ printf("I'm clone %d out of %d total located @ %s\n", me, nproc, hostname); } else { printf("I'm a lost clone %d out of %d total!\n", me, nproc); } MPI_Finalize(); return 0; }

MPI Elementary Functions

[cosc330@bourbaki examples] $ mpiexec -np 10 whereAmI
I'm clone 0 out of 10 total located @ bourbaki.une.edu.au
I'm clone 1 out of 10 total located @ bourbaki.une.edu.au
I'm clone 2 out of 10 total located @ bourbaki.une.edu.au
I'm clone 4 out of 10 total located @ bourbaki.une.edu.au
I'm clone 7 out of 10 total located @ bourbaki.une.edu.au
I'm clone 8 out of 10 total located @ bourbaki.une.edu.au
I'm clone 9 out of 10 total located @ bourbaki.une.edu.au
I'm clone 3 out of 10 total located @ bourbaki.une.edu.au
I'm clone 5 out of 10 total located @ bourbaki.une.edu.au
I'm clone 6 out of 10 total located @ bourbaki.une.edu.au
[cosc330@bourbaki examples] $ 


MPI Elementary Functions

[cosc330@bourbaki examples] $ cat b1tob4 
b1
b2
b3
b4
[cosc330@bourbaki examples] $


MPI Elementary Functions

[cosc330@bourbaki examples] $ mpirun -np 5 --map-by node --hostfile b1tob4 whereAmI
I'm clone 0 out of 5 total located @ b1
I'm clone 4 out of 5 total located @ b1
I'm clone 1 out of 5 total located @ b2
I'm clone 2 out of 5 total located @ b3
I'm clone 3 out of 5 total located @ b4
[cosc330@bourbaki examples] $ mpirun -np 10 --map-by node --hostfile b1tob4 whereAmI
I'm clone 3 out of 10 total located @ b4
I'm clone 7 out of 10 total located @ b4
I'm clone 9 out of 10 total located @ b2
I'm clone 5 out of 10 total located @ b2
I'm clone 0 out of 10 total located @ b1
I'm clone 4 out of 10 total located @ b1
I'm clone 1 out of 10 total located @ b2
I'm clone 2 out of 10 total located @ b3
I'm clone 6 out of 10 total located @ b3
I'm clone 8 out of 10 total located @ b1
[cosc330@bourbaki examples] $ 


MPI Elementary Functions

[cosc330@bourbaki examples] $ mpirun -np 10 --map-by core  --hostfile b1tob4 whereAmI
I'm clone 8 out of 10 total located @ b4
I'm clone 9 out of 10 total located @ b4
I'm clone 1 out of 10 total located @ b1
I'm clone 2 out of 10 total located @ b1
I'm clone 5 out of 10 total located @ b2
I'm clone 3 out of 10 total located @ b2
I'm clone 7 out of 10 total located @ b3
I'm clone 0 out of 10 total located @ b1
I'm clone 6 out of 10 total located @ b3
I'm clone 4 out of 10 total located @ b2
[cosc330@bourbaki examples] $ 

MPI Elementary Functions

#include "mpi.h"
int MPI_Comm_rank ( MPI_Comm comm, int *rank )


MPI Elementary Functions

MPI_Comm_rank(MPI_COMM_WORLD,&me);

MPI Elementary Functions


#include "mpi.h" int MPI_Comm_size ( MPI_Comm comm, int *size )
MPI_Comm_size(MPI_COMM_WORLD,&nproc);

Summary


Reading