CS5720 - Learning Rate: Finding the Sweet Spot

The Goldilocks Principle

The learning rate must be just right - not too small, not too large, but somewhere in between for optimal training.

What is Learning Rate?

The learning rate (α) controls how much we adjust weights based on the gradient:

                        new_weight = old_weight - α × gradient
                    

Think of it as the "step size" when walking down a hill.

🎯 Finding the Right Learning Rate

α = 0.0001 (Too Small)

• Extremely slow convergence
• May get stuck before reaching minimum
• Wastes computational resources
• Risk of stopping too early

α = 0.01 (Just Right)

• Steady convergence
• Reaches good minimum
• Stable training
• Efficient use of resources

α = 1.0 (Too Large)

• Overshoots minimum
• Loss may explode
• Unstable, erratic behavior
• May never converge

Watch how different learning rates affect convergence!

Too Small (α = 0.001)

Iterations: 0 | Loss: 1.000

Just Right (α = 0.05)

Iterations: 0 | Loss: 1.000

Too Large (α = 0.5)

Iterations: 0 | Loss: 1.000