Vanishing gradients occur when gradients become exponentially smaller as they propagate backward through time, making it impossible to learn long-term dependencies.
Why It Happens:
• Chain rule multiplication across many time steps
• Activation function derivatives are small (≤ 0.25)
• Weight matrix eigenvalues less than 1
• Exponential decay over long sequences