CS5720 - Adam Optimizer: Adaptive Learning

The Adam Algorithm

Adam (Adaptive Moment Estimation) combines the best of momentum and RMSprop, providing adaptive learning rates with momentum for each parameter.

First moment: m_t = β₁m_t-1 + (1-β₁)g_t

Second moment: v_t = β₂v_t-1 + (1-β₂)g_t²

Bias correction: m̂_t = m_t/(1-β₁^t), v̂_t = v_t/(1-β₂^t)

Update rule: θ_t+1 = θ_t - η·m̂_t/√(v̂_t + ε)

Default hyperparameters (work well for most problems):
• Learning rate (η): 0.001
• β₁ (momentum): 0.9
• β₂ (RMSprop): 0.999
• ε (stability): 10^-8

Key Components

📊 First Moment (Momentum)

Exponential moving average of gradients. Provides velocity and helps escape saddle points.

📈 Second Moment (Adaptive LR)

Exponential moving average of squared gradients. Adapts learning rate per parameter.

⚖️ Bias Correction

Corrects for initialization bias in early training steps. Critical for stability.

Adam Optimizer: Adaptive Learning

The Adam Algorithm

Key Components

Adam in Action

Modal Title