CS5720 - ResNet: Skip Connections and Very Deep Networks

The Degradation Problem

Shocking Discovery:

Adding more layers made networks perform WORSE - not just on test data (overfitting), but on training data too!

The Evidence:
• 56-layer network had higher training error than 20-layer
• Not overfitting - training error was worse!
• Optimization difficulty, not model capacity
• Deeper networks couldn't even learn identity mappings

The Paradox:

If we stack a 20-layer network with 36 identity layers, we should get at least the same performance. But optimization couldn't find this solution!

The Skip Connection Solution

Revolutionary Idea:

Instead of learning H(x), learn the residual F(x) = H(x) - x
Then H(x) = F(x) + x

Why This Works:
• Easier to learn F(x) = 0 than H(x) = x
• Gradients flow directly through shortcuts
• Creates ensemble of shallow networks
• Identity mapping is the default

The Result:

ResNet-152 won ILSVRC 2015 with 3.57% error - better than human performance (5.1%)!

Residual Block Architecture

Conv → BN → ReLU

Conv → BN

F(x) + x

Plain Network (No Skip)

20-layer: 27.94% error
56-layer: 28.54% error
Deeper is worse!

ResNet (With Skip)

20-layer: 27.88% error
56-layer: 25.53% error
152-layer: 21.43% error
Deeper is better!

ResNet: Skip Connections and Very Deep Networks

The Degradation Problem

The Skip Connection Solution

Residual Block Architecture

Modal Title