跳转至

猪吃小虎's Note

CUDA

CUDA

Grid, Block, and Thread¶

cuda grid

each block has shared memory for all threads within the block.

each thread has its own private memory.

CUDA Thread Block Scheduling¶

one block is mapped to one SMM core (streaming multiprocessor core)

alt text

alt text

warp¶

warp is the execution context storage for CUDA threads.
a warp consists of 32 threads, each thread is an instruction bank.

Matrix Multiplication in CUDA¶

alt text

alt text

alt text

Parallel Reduction in CUDA¶

alt text

sequential addressing > interleaved addressing