CS5720 - Week 2
Slide 27 of 40
Learning Rate - Finding the Sweet Spot
The Goldilocks Principle
The learning rate must be
just right
- not too small, not too large, but somewhere in between for optimal training.
What is Learning Rate?
The learning rate (α) controls how much we adjust weights based on the gradient:
new_weight = old_weight - α × gradient
Think of it as the "step size" when walking down a hill.
🎯 Finding the Right Learning Rate
Start with common values: 0.001, 0.01, 0.1
Use learning rate schedules
Monitor loss curves during training
Consider adaptive methods (Adam, RMSprop)
Learning Rate Effects
α = 0.0001 (Too Small)
• Extremely slow convergence
• May get stuck before reaching minimum
• Wastes computational resources
• Risk of stopping too early
α = 0.01 (Just Right)
• Steady convergence
• Reaches good minimum
• Stable training
• Efficient use of resources
α = 1.0 (Too Large)
• Overshoots minimum
• Loss may explode
• Unstable, erratic behavior
• May never converge
Learning Rate Comparison
Start All Animations
Watch how different learning rates affect convergence!
Too Small (α = 0.001)
Iterations:
0
| Loss:
1.000
Just Right (α = 0.05)
Iterations:
0
| Loss:
1.000
Too Large (α = 0.5)
Iterations:
0
| Loss:
1.000
← Previous
Next →
Prepared by Dr. Gorkem Kar
Modal Title
×
Modal content goes here...