CS5720 - LSTM: Solving the Vanishing Gradient Problem

The Problem with Standard RNNs

🔥 Vanishing Gradients

Gradients become exponentially smaller as they propagate back through time, making it impossible to learn long-term dependencies.

🧠 Short Memory Span

Standard RNNs struggle to remember information from more than a few time steps ago, limiting their practical applications.

⚡ Training Instability

Either gradients vanish (no learning) or explode (unstable training), making standard RNNs difficult to train effectively.

🛤️ Cell State Highway

A separate cell state acts as an information highway, allowing gradients to flow directly across time steps without degradation.

🚪 Smart Gates

Three specialized gates (forget, input, output) control what information to keep, update, and output at each time step.

🌊 Stable Gradient Flow

LSTMs maintain stable gradients across hundreds of time steps, enabling learning of complex long-term patterns.

Standard RNN

Gradients disappear over time →

LSTM

Stable gradients across time!

Key Insight: LSTMs solve the vanishing gradient problem through architectural innovation, not just mathematical tricks.