Cross-Entropy Loss measures the difference between two probability distributions - your predicted probabilities and the true distribution.
Why Cross-Entropy for Classification?
• Probability-based: Works with class probabilities
• Logarithmic penalty: Heavily penalizes confident wrong predictions
• Smooth gradients: Better for optimization than accuracy
• Information theory: Measures "surprise" of predictions
Loss = -Σ(y_true × log(y_pred))
(Click for detailed explanation)
💡 Key Insight
Cross-entropy penalizes wrong predictions exponentially. Being 90% confident about the wrong class is much worse than being 60% confident!
MSE vs Cross-Entropy
Aspect
MSE
Cross-Entropy
Problem Type
Regression
Classification
Output
Continuous values
Probabilities
Gradients
Can saturate
Well-behaved
Penalty
Quadratic
Logarithmic
Use When
Predicting amounts
Predicting categories
Quick Rule: If your output is a probability distribution over classes, use cross-entropy. If it's a continuous value, use MSE.
Interactive Cross-Entropy Visualization
Adjust the predicted probabilities and see how cross-entropy loss changes! True class: Cat (Class 0)