CS5720 - Week 7
Slide 129 of 140

Stacked/Deep RNNs

What are Stacked RNNs?

Stacked RNNs (also called Deep RNNs) consist of multiple RNN layers stacked vertically, where the output of one layer becomes the input to the next layer.
Key Characteristics:

• Multiple hidden layers at each time step
• Each layer processes the output of the previous layer
• Creates hierarchical representations
• Increased modeling capacity
🏗️ Architecture Insight:
Think of it like a multi-story building where each floor processes information from the floor below, adding more complexity and abstraction.

Why Stack RNNs?

  • 🧠
    Enhanced Representation Power
    Multiple layers can learn hierarchical features from simple to complex patterns
  • 📈
    Better Nonlinear Modeling
    Additional layers increase the network's ability to model complex relationships
  • 🔍
    Automatic Feature Extraction
    Lower layers extract basic features, higher layers combine them into complex patterns
⚖️ Trade-off:
More layers = more power, but also more parameters, training time, and potential overfitting!

Single vs Stacked RNN Architecture

Single RNN
Output Layer
↑
RNN Layer
↑
Input Layer
Simple but limited capacity
Stacked RNN
Output Layer
↑
RNN Layer 3
↑
RNN Layer 2
↑
RNN Layer 1
↑
Input Layer
Complex but powerful modeling
Key Insight: Each additional layer adds computational depth and representational complexity to the model.
Prepared by Dr. Gorkem Kar