CS5720 - Week 7
Slide 121 of 140

LSTM: Solving the Vanishing Gradient Problem

The Problem with Standard RNNs

🔥 Vanishing Gradients
Gradients become exponentially smaller as they propagate back through time, making it impossible to learn long-term dependencies.
🧠 Short Memory Span
Standard RNNs struggle to remember information from more than a few time steps ago, limiting their practical applications.
⚡ Training Instability
Either gradients vanish (no learning) or explode (unstable training), making standard RNNs difficult to train effectively.

LSTM: The Solution

🛤️ Cell State Highway
A separate cell state acts as an information highway, allowing gradients to flow directly across time steps without degradation.
🚪 Smart Gates
Three specialized gates (forget, input, output) control what information to keep, update, and output at each time step.
🌊 Stable Gradient Flow
LSTMs maintain stable gradients across hundreds of time steps, enabling learning of complex long-term patterns.

Gradient Flow Comparison

Standard RNN
Gradients disappear over time →
LSTM
Stable gradients across time!
Key Insight: LSTMs solve the vanishing gradient problem through architectural innovation, not just mathematical tricks.
Prepared by Dr. Gorkem Kar