CS5720 - Week 3
Slide 53 of 60
Adam Optimizer: Adaptive Learning
The Adam Algorithm
Adam (Adaptive Moment Estimation)
combines the best of momentum and RMSprop, providing adaptive learning rates with momentum for each parameter.
First moment:
m
t
= β₁m
t-1
+ (1-β₁)g
t
Second moment:
v
t
= β₂v
t-1
+ (1-β₂)g
t
²
Bias correction:
m̂
t
= m
t
/(1-β₁
t
), v̂
t
= v
t
/(1-β₂
t
)
Update rule:
θ
t+1
= θ
t
- η·m̂
t
/√(v̂
t
+ ε)
Default hyperparameters (work well for most problems):
• Learning rate (η): 0.001
• β₁ (momentum): 0.9
• β₂ (RMSprop): 0.999
• ε (stability): 10
-8
Key Components
📊 First Moment (Momentum)
Exponential moving average of gradients. Provides velocity and helps escape saddle points.
📈 Second Moment (Adaptive LR)
Exponential moving average of squared gradients. Adapts learning rate per parameter.
⚖️ Bias Correction
Corrects for initialization bias in early training steps. Critical for stability.
Adam in Action
Effective Learning Rate
0.001
Momentum Strength
0.0
Gradient Adaptation
1.0x
Learning Rate (η)
0.001
β₁ (Momentum)
0.9
β₂ (RMSprop)
0.999
Time Step
1
← Previous
Next →
Prepared by Dr. Gorkem Kar
Modal Title
×
Modal content goes here...