Built on the 16nm process (vs. 28nm on Maxwell), Pascal could fit more SMs on the same chip area — 54 vs. 26 on Maxwell (for datacenter GPUs) — delivering a massive boost in FLOPs.
Deep learning was exploding at the time. ML has a "simple" two-step flow — training (the network learns from data) and inference (the trained network processes new inputs). Neural nets are tolerant to small numerical errors, so lower precision (FP16) reduces memory use, speeds up math, and lets you train/deploy larger models. Pascal’s FP16 support was the first step toward Tensor Cores, which arrive in the next architecture.