When Gradients Go Nuclear

💥 The Opposite Extreme
While vanishing gradients make learning too slow, exploding gradients make training unstable and can cause numerical overflow. Gradient values grow exponentially during backpropagation!

Common Causes:

  • Poor weight initialization (too large)
  • Deep networks with many layers
  • Recurrent neural networks (RNNs)
  • High learning rates
Normal Training
∇W ≈ 0.01
Exploding Gradients
∇W > 1000!

Gradient Explosion Visualization

1.0

Watch how gradient values can explode through layers!