Letβs look at how to program GPUs using CUDA, with a simple example: vector addition.
CUDA lets you write C code that runs on the GPU, in the form of functions known as π ππ§π£ππ‘π¨. These functions are written in C and can be called from the CPU (host) side just like regular functions, but they execute on the GPU.
The idea:
__global__ to mark a function as a kernel.cudaMalloc and cudaMemcpy handle GPU memory allocation and data transfer.myKernel<<>>(args);
Threads are grouped into blocks. The total number of threads is:
numBlocks Γ numThreadsPerBlock
Other notes: