BPTT is the algorithm used to train RNNs by propagating error gradients backward through time steps to update the network weights.
How it Works:
• Unroll the RNN through all time steps
• Calculate error at each output
• Propagate gradients backward from T to 1
• Accumulate gradients across time steps
• Update weights using total gradients
🔄 Key Insight:
BPTT treats the unrolled RNN as a very deep feedforward network, but with shared weights across time steps.
BPTT Challenges
📉
Vanishing Gradients
Gradients become exponentially small in long sequences
💥
Exploding Gradients
Gradients grow exponentially large
⏱️
Computational Cost
Memory and time complexity grow with sequence length
🧠
Long-term Dependencies
Difficulty learning connections across many time steps