CS5720 - Learning Rate Scheduling

Why Schedule Learning Rates?

Learning rate scheduling dynamically adjusts the learning rate during training, allowing for faster initial progress and finer convergence.

🎯 The Goldilocks Problem

• Early training: Want large LR for fast progress
• Mid training: Moderate LR to explore
• Late training: Small LR for fine-tuning
• One size doesn't fit all!

Benefits of scheduling:
• 🚀 Faster convergence in early epochs
• 🎯 Better final accuracy
• 🛡️ Escape from plateaus
• 🔧 Automatic adaptation to training phase

Popular Schedules

📊 Step Decay

Drop learning rate by a factor at specific epochs

lr = lr₀ × γ^(epoch/step_size)

📉 Exponential Decay

Smooth exponential decrease over time

lr = lr₀ × e^(-λt)

🌊 Cosine Annealing

Follows a cosine curve from max to min

lr = lr_min + 0.5(lr_max - lr_min)(1 + cos(πt/T))

🏔️ Reduce on Plateau

Reduce when validation loss stops improving

if no improvement: lr = lr × factor

Learning Rate Scheduling

Why Schedule Learning Rates?

Popular Schedules

Learning Rate Schedule Visualization

Modal Title