跳转至

CUDA

Grid, Block, and Thread

cuda grid

each block has shared memory for all threads within the block.

each thread has its own private memory.

CUDA Thread Block Scheduling

  • one block is mapped to one SMM core (streaming multiprocessor core)

alt text

alt text

warp

  • warp is the execution context storage for CUDA threads.
  • a warp consists of 32 threads, each thread is an instruction bank.

Matrix Multiplication in CUDA

alt text

alt text

alt text

Parallel Reduction in CUDA

alt text

sequential addressing > interleaved addressing