CS5720 - The Exploding Gradient Problem

When Gradients Go Nuclear

💥 The Opposite Extreme

While vanishing gradients make learning too slow, exploding gradients make training unstable and can cause numerical overflow. Gradient values grow exponentially during backpropagation!

Common Causes:

Poor weight initialization (too large)
Deep networks with many layers
Recurrent neural networks (RNNs)
High learning rates

Normal Training

∇W ≈ 0.01

Exploding Gradients

∇W > 1000!

Gradient Explosion Visualization

1.0

Watch how gradient values can explode through layers!

Symptoms & Solutions

🚫

NaN Loss Values

Loss suddenly becomes NaN due to numerical overflow in calculations

📈

Wild Oscillations

Loss jumps erratically, training becomes completely unstable

🚀

Model Divergence

Weights grow unbounded, model predictions become nonsensical

✂️

Solution: Gradient Clipping

Clip gradients to maximum threshold to prevent explosion

📊

Solution: Batch Norm

Normalize activations to keep values in reasonable ranges

🎯

Solution: Proper Init

Use Xavier or He initialization to start with appropriate scales

Modal Title

The Exploding Gradient Problem

When Gradients Go Nuclear

Common Causes:

Gradient Explosion Visualization

Symptoms & Solutions