When Gradients Go Nuclear
💥 The Opposite Extreme
While vanishing gradients make learning too slow, exploding gradients make training unstable and can cause numerical overflow. Gradient values grow exponentially during backpropagation!
Common Causes:
- Poor weight initialization (too large)
- Deep networks with many layers
- Recurrent neural networks (RNNs)
- High learning rates
Normal Training
∇W ≈ 0.01
Exploding Gradients
∇W > 1000!
Gradient Explosion Visualization
1.0
Watch how gradient values can explode through layers!