CS5720 - Stacked/Deep RNNs

What are Stacked RNNs?

Stacked RNNs (also called Deep RNNs) consist of multiple RNN layers stacked vertically, where the output of one layer becomes the input to the next layer.

Key Characteristics:

• Multiple hidden layers at each time step
• Each layer processes the output of the previous layer
• Creates hierarchical representations
• Increased modeling capacity

🏗️ Architecture Insight:

Think of it like a multi-story building where each floor processes information from the floor below, adding more complexity and abstraction.

Why Stack RNNs?

🧠

Enhanced Representation Power

Multiple layers can learn hierarchical features from simple to complex patterns
📈

Better Nonlinear Modeling

Additional layers increase the network's ability to model complex relationships
🔍

Automatic Feature Extraction

Lower layers extract basic features, higher layers combine them into complex patterns

⚖️ Trade-off:

More layers = more power, but also more parameters, training time, and potential overfitting!

Single vs Stacked RNN Architecture

Single RNN

Output Layer

↑

RNN Layer

↑

Input Layer

Simple but limited capacity

Stacked RNN

Output Layer

↑

RNN Layer 3

↑

RNN Layer 2

↑

RNN Layer 1

↑

Input Layer

Complex but powerful modeling

Key Insight: Each additional layer adds computational depth and representational complexity to the model.

Stacked/Deep RNNs

What are Stacked RNNs?

Why Stack RNNs?

Single vs Stacked RNN Architecture

Modal Title