CS5720 - Week 2
Slide 27 of 40

Learning Rate - Finding the Sweet Spot

The Goldilocks Principle

The learning rate must be just right - not too small, not too large, but somewhere in between for optimal training.
What is Learning Rate?

The learning rate (α) controls how much we adjust weights based on the gradient:

new_weight = old_weight - α × gradient

Think of it as the "step size" when walking down a hill.
🎯 Finding the Right Learning Rate
  • Start with common values: 0.001, 0.01, 0.1
  • Use learning rate schedules
  • Monitor loss curves during training
  • Consider adaptive methods (Adam, RMSprop)

Learning Rate Effects

α = 0.0001 (Too Small)
• Extremely slow convergence
• May get stuck before reaching minimum
• Wastes computational resources
• Risk of stopping too early
α = 0.01 (Just Right)
• Steady convergence
• Reaches good minimum
• Stable training
• Efficient use of resources
α = 1.0 (Too Large)
• Overshoots minimum
• Loss may explode
• Unstable, erratic behavior
• May never converge

Learning Rate Comparison

Watch how different learning rates affect convergence!
Too Small (α = 0.001)
Iterations: 0 | Loss: 1.000
Just Right (α = 0.05)
Iterations: 0 | Loss: 1.000
Too Large (α = 0.5)
Iterations: 0 | Loss: 1.000
Prepared by Dr. Gorkem Kar