CS5720 - Week 2
Slide 36 of 40

Weight Decay (L2 Regularization)

The Weight Penalty Concept

Weight decay adds a "simplicity tax" to the loss function. Large weights are expensive, so the model only uses them when absolutely necessary.
Why it works:

Prevents any single weight from becoming too large
Encourages the network to use many small weights
Creates smoother decision boundaries
Reduces model sensitivity to input noise
🎛️ Interactive Lambda (λ) Control
Lambda: 0.010
Adjust to see the effect on regularization strength

The Mathematical Formula

Regularized Loss Function:
L_total = L_original + λ × Σ(w²)
where λ (lambda) controls regularization strength
Breaking it down:

L_original: Your normal loss (MSE, cross-entropy, etc.)
λ: Regularization strength (hyperparameter)
Σ(w²): Sum of all squared weights
L_total: What we actually minimize
🔍 Key Insight
The gradient now has two parts: one pushing toward lower loss, another pushing weights toward zero!

Live Weight Decay Visualization

Network Weights
Height represents weight magnitude
Loss Components
Original Loss
2.45
+ Weight Penalty
0.23
Total Loss: 2.68
🎯 Effect of Current λ = 0.010
Moderate regularization: Balances original loss with weight control
Prepared by Dr. Gorkem Kar