CS5720 - Week 3
Slide 54 of 60

Learning Rate Scheduling

Why Schedule Learning Rates?

Learning rate scheduling dynamically adjusts the learning rate during training, allowing for faster initial progress and finer convergence.
🎯 The Goldilocks Problem
Early training: Want large LR for fast progress
Mid training: Moderate LR to explore
Late training: Small LR for fine-tuning
One size doesn't fit all!
Benefits of scheduling:
• 🚀 Faster convergence in early epochs
• 🎯 Better final accuracy
• 🛡️ Escape from plateaus
• 🔧 Automatic adaptation to training phase

Popular Schedules

📊 Step Decay
Drop learning rate by a factor at specific epochs
lr = lr₀ × γ^(epoch/step_size)
📉 Exponential Decay
Smooth exponential decrease over time
lr = lr₀ × e^(-λt)
🌊 Cosine Annealing
Follows a cosine curve from max to min
lr = lr_min + 0.5(lr_max - lr_min)(1 + cos(πt/T))
🏔️ Reduce on Plateau
Reduce when validation loss stops improving
if no improvement: lr = lr × factor

Learning Rate Schedule Visualization

Prepared by Dr. Gorkem Kar