CS5720 - Weight Decay (L2 Regularization)

The Weight Penalty Concept

Weight decay adds a "simplicity tax" to the loss function. Large weights are expensive, so the model only uses them when absolutely necessary.

Why it works:

• Prevents any single weight from becoming too large
• Encourages the network to use many small weights
• Creates smoother decision boundaries
• Reduces model sensitivity to input noise

🎛️ Interactive Lambda (λ) Control

Lambda: 0.010

Adjust to see the effect on regularization strength

The Mathematical Formula

Regularized Loss Function:

L_total = L_original + λ × Σ(w²)

where λ (lambda) controls regularization strength

Breaking it down:

• L_original: Your normal loss (MSE, cross-entropy, etc.)
• λ: Regularization strength (hyperparameter)
• Σ(w²): Sum of all squared weights
• L_total: What we actually minimize

🔍 Key Insight

The gradient now has two parts: one pushing toward lower loss, another pushing weights toward zero!

Live Weight Decay Visualization

Network Weights

Height represents weight magnitude

Loss Components

Original Loss

2.45

+ Weight Penalty

0.23

Total Loss: 2.68

🎯 Effect of Current λ = 0.010

Moderate regularization: Balances original loss with weight control

Weight Decay (L2 Regularization)

The Weight Penalty Concept

The Mathematical Formula

Live Weight Decay Visualization

Modal Title