The Foundation of Training

🎯 The Goldilocks Principle
Weight initialization must be "just right" - not too small (vanishing gradients), not too large (exploding gradients), but perfectly balanced to maintain stable signal flow through the network.

Why Initialization Matters:

  • Determines initial loss landscape position
  • Affects gradient flow from the start
  • Can make or break deep network training
  • Influences convergence speed dramatically
Forward Signal
Var[a] = 1.0
Backward Gradient
Var[∇] = 1.0

Weight Distribution Visualization

Xavier initialization for balanced variance