>

COSC330/530 Parallel and Distributed Computing

Lecture 18 - Introduction to NVidia CUDA

Dr. Mitchell Welch


Reading


Summary


Welcome to GPU Programming with NVidia CUDA


Welcome to GPU Programming with NVidia CUDA


center-aligned image


Welcome to GPU Programming with NVidia CUDA


Welcome to GPU Programming with NVidia CUDA


Welcome to GPU Programming with NVidia CUDA


Welcome to GPU Programming with NVidia CUDA


Welcome to GPU Programming with NVidia CUDA


The CUDA Programming Model


The CUDA Programming Model


The CUDA Programming Model


center-aligned image


The CUDA Programming Model


The CUDA Programming Model


The CUDA Programming Model

Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)


The CUDA Programming Model


The CUDA Programming Model


The CUDA Programming Model

// Kernel definition
__global__ void VecAdd(float* A, float* B, float* C) { 
    int i = threadIdx.x;
    C[i] = A[i] + B[i]; 
} 

int main() { 
    ... 
    // Kernel invocation with N threads 
    VecAdd<<<1, N>>>(A, B, C); 
    ... 

}


The CUDA Programming Model


The CUDA Programming Model


The CUDA Programming Model


The CUDA Programming Model


The CUDA Programming Model

/* Creates a 16 x 16 x 1 dim3 named threadsPerBlock */
dim3 threadsPerBlock(16, 16);
/* Create a 4 x 4 x 2 dim3 named numBlocks */
dim3 numBlocks(4,4,2);

...

/* These can then be used in the execution config. for a kernel. */


The CUDA Programming Model


The CUDA Programming Model

__host__ ​ __device__ ​cudaError_t cudaMalloc ( void** devPtr, size_t size )

The CUDA Programming Model

__host__ ​ __device__ ​cudaError_t cudaFree ( void* devPtr )


The CUDA Programming Model


The CUDA Programming Model

__host__ ​cudaError_t cudaMemcpy ( void* dst, const void* src, size_t count, cudaMemcpyKind kind )


The CUDA Programming Model


Hello World in CUDA


Hello World in CUDA


Hello World in CUDA


The NVidia A100 GPU


The NVidia A100 GPU


The NVidia A100 GPU


Makefiles in CUDA

COMPILER = nvcc
CFLAGS = -I /home/cosc330/public_html/lectures/cuda-samples/Common
EXES = vectorAdd deviceQuery
all: ${EXES}


vectorAdd:   vectorAdd.cu
    ${COMPILER} ${CFLAGS} vectorAdd.cu  -o vectorAdd

deviceQuery:   deviceQuery.cpp
    ${COMPILER} ${CFLAGS} deviceQuery.cpp  -o deviceQuery

%.o: %.c %.h  makefile
    ${COMPILER} ${CFLAGS} $< -c 

clean:
    rm -f *.o *~ ${EXES} ${CFILES}



Makefiles in CUDA


Summary


Reading