The Foundation of Training
🎯 The Goldilocks Principle
Weight initialization must be "just right" - not too small (vanishing gradients), not too large (exploding gradients), but perfectly balanced to maintain stable signal flow through the network.
Why Initialization Matters:
- Determines initial loss landscape position
- Affects gradient flow from the start
- Can make or break deep network training
- Influences convergence speed dramatically
Forward Signal
Var[a] = 1.0
Backward Gradient
Var[∇] = 1.0
Weight Distribution Visualization
Xavier initialization for balanced variance