The Core Idea
Backpropagation efficiently calculates how much each weight contributed to the error by propagating error signals backward through the network.
Why Backpropagation?
Without it, we'd need to:
• Change each weight slightly
• See how loss changes
• Repeat for millions of weights!
Backprop calculates ALL gradients in one backward pass!
The Chain Rule: ∂Loss/∂weight = ∂Loss/∂output × ∂output/∂weight
(Click for intuitive explanation)
The Algorithm Steps
1. Forward Pass
Feed input through network, save all intermediate values (activations)
2. Calculate Loss
Compare prediction with target using loss function
3. Backward Pass
Starting from loss, calculate gradients layer by layer going backward
4. Update Weights
Use gradients to update all weights via gradient descent
Key Insight: Each neuron only needs local information - its inputs, weights, and the error signal from the next layer!