CS5720 - Backpropagation Through Time (BPTT)

What is BPTT?

BPTT is the algorithm used to train RNNs by propagating error gradients backward through time steps to update the network weights.

How it Works:

• Unroll the RNN through all time steps
• Calculate error at each output
• Propagate gradients backward from T to 1
• Accumulate gradients across time steps
• Update weights using total gradients

🔄 Key Insight:

BPTT treats the unrolled RNN as a very deep feedforward network, but with shared weights across time steps.

BPTT Challenges

📉

Vanishing Gradients

Gradients become exponentially small in long sequences
💥

Exploding Gradients

Gradients grow exponentially large
⏱️

Computational Cost

Memory and time complexity grow with sequence length
🧠

Long-term Dependencies

Difficulty learning connections across many time steps

BPTT Process Visualization

1. Forward Pass Through Time

h₀

→

h₁

→

h₂

→

h₃

→

y₃

2. Calculate Error at Output

Target

y₃

Error

→

Loss

3. Backward Pass (Gradient Flow)

∂L/∂h₃

←

∂L/∂h₂

←

∂L/∂h₁

←

∂L/∂h₀

4. Weight Updates (Accumulated Gradients)

∂L/∂W

Σ ∂L/∂Wₜ

→

W_new

Key Formula: ∂L/∂h_t = ∂L/∂h_{t+1} × ∂h_{t+1}/∂h_t + ∂L/∂y_t × ∂y_t/∂h_t

Backpropagation Through Time (BPTT)

What is BPTT?

BPTT Challenges

BPTT Process Visualization

Modal Title