CS5720 - Batch vs Stochastic vs Mini-Batch Gradient Descent

📚

Batch Gradient Descent

Uses the entire dataset to compute gradients for each update step.

✅ Pros

Stable convergence, optimal direction

❌ Cons

Slow updates, high memory

🎲

Stochastic Gradient Descent

Uses one example at a time to compute gradients and update weights.

✅ Pros

Fast updates, low memory, escapes local minima

❌ Cons

Noisy convergence, unstable

⚖️

Mini-Batch Gradient Descent

Uses small batches (typically 32-256 examples) for each update.

✅ Pros

Best of both worlds, GPU efficient

❌ Cons

Batch size tuning needed

Interactive Gradient Descent Comparison

Loss Landscape Navigation

Performance Metrics

Updates per Epoch

Noise Level

Low

Memory Usage

High

Convergence

Smooth

🎮 Try Different Gradient Descent Variants

Click a button above to see how different gradient descent variants navigate the loss landscape!