CS5720 - Week 6
Slide 108 of 120

Backpropagation Through Time (BPTT)

What is BPTT?

BPTT is the algorithm used to train RNNs by propagating error gradients backward through time steps to update the network weights.
How it Works:

Unroll the RNN through all time steps
Calculate error at each output
Propagate gradients backward from T to 1
Accumulate gradients across time steps
Update weights using total gradients
🔄 Key Insight:
BPTT treats the unrolled RNN as a very deep feedforward network, but with shared weights across time steps.

BPTT Challenges

  • 📉
    Vanishing Gradients
    Gradients become exponentially small in long sequences
  • 💥
    Exploding Gradients
    Gradients grow exponentially large
  • ⏱️
    Computational Cost
    Memory and time complexity grow with sequence length
  • 🧠
    Long-term Dependencies
    Difficulty learning connections across many time steps

BPTT Process Visualization

1. Forward Pass Through Time
h₀
h₁
h₂
h₃
y₃
2. Calculate Error at Output
Target
-
y₃
=
Error
Loss
3. Backward Pass (Gradient Flow)
∂L/∂h₃
∂L/∂h₂
∂L/∂h₁
∂L/∂h₀
4. Weight Updates (Accumulated Gradients)
∂L/∂W
=
Σ ∂L/∂Wₜ
W_new
Key Formula: ∂L/∂h_t = ∂L/∂h_{t+1} × ∂h_{t+1}/∂h_t + ∂L/∂y_t × ∂y_t/∂h_t
Prepared by Dr. Gorkem Kar