CS5720 - Week 2
Slide 24 of 40

Cross-Entropy for Classification

Understanding Cross-Entropy

Cross-Entropy Loss measures the difference between two probability distributions - your predicted probabilities and the true distribution.
Why Cross-Entropy for Classification?

Probability-based: Works with class probabilities
Logarithmic penalty: Heavily penalizes confident wrong predictions
Smooth gradients: Better for optimization than accuracy
Information theory: Measures "surprise" of predictions
Loss = -Σ(y_true × log(y_pred))
(Click for detailed explanation)
💡 Key Insight
Cross-entropy penalizes wrong predictions exponentially. Being 90% confident about the wrong class is much worse than being 60% confident!

MSE vs Cross-Entropy

Aspect MSE Cross-Entropy
Problem Type Regression Classification
Output Continuous values Probabilities
Gradients Can saturate Well-behaved
Penalty Quadratic Logarithmic
Use When Predicting amounts Predicting categories
Quick Rule: If your output is a probability distribution over classes, use cross-entropy. If it's a continuous value, use MSE.

Interactive Cross-Entropy Visualization

Adjust the predicted probabilities and see how cross-entropy loss changes!
True class: Cat (Class 0)
0.80
Cat ✓
0.15
Dog
0.05
Bird
Cross-Entropy Loss: 0.223

Drag to adjust Cat probability
Prepared by Dr. Gorkem Kar