The core of NVIDIA's GPU architecture. Introduced in 2006 with Tesla (GeForce 8 series), capable of general-purpose compute. CUDA was launched alongside.

Goal: Make GPU programming accessible to all developers. The SM has evolved to turn GPUs into powerful general-purpose processors.

Fermi SM key components:

4 execution blocks — warps run here, up to 4 warps simultaneously
Dispatching & scheduling units — manage up to 24 warps in flight
Register file & on-chip memory — warp scheduling locks required resources until completion

⚠️ Multiple warps compete for resources → occupancy problem. Maximize warps in flight to utilize SM while waiting for memory operations.