CS5720 - Week 5
Slide 86 of 100

ResNet: Skip Connections and Very Deep Networks

The Degradation Problem

Shocking Discovery:

Adding more layers made networks perform WORSE - not just on test data (overfitting), but on training data too!
The Evidence:
• 56-layer network had higher training error than 20-layer
• Not overfitting - training error was worse!
• Optimization difficulty, not model capacity
• Deeper networks couldn't even learn identity mappings
The Paradox:
If we stack a 20-layer network with 36 identity layers, we should get at least the same performance. But optimization couldn't find this solution!

The Skip Connection Solution

Revolutionary Idea:

Instead of learning H(x), learn the residual F(x) = H(x) - x
Then H(x) = F(x) + x
Why This Works:
• Easier to learn F(x) = 0 than H(x) = x
• Gradients flow directly through shortcuts
• Creates ensemble of shallow networks
• Identity mapping is the default
The Result:
ResNet-152 won ILSVRC 2015 with 3.57% error - better than human performance (5.1%)!

Residual Block Architecture

x
Conv → BN → ReLU
Conv → BN
F(x) + x
Plain Network (No Skip)
20-layer: 27.94% error
56-layer: 28.54% error
Deeper is worse!
ResNet (With Skip)
20-layer: 27.88% error
56-layer: 25.53% error
152-layer: 21.43% error
Deeper is better!
Prepared by Dr. Gorkem Kar