CS5720 - Week 2
Slide 31 of 40

Training vs Validation vs Test Sets

The Three Data Sets

60%
Training Set 📚
Used to train the model - the network learns patterns from this data by adjusting weights
20%
Validation Set 🔍
Used during training to tune hyperparameters and prevent overfitting
20%
Test Set 🎯
Used only once at the end to evaluate final model performance

The Cooking Analogy

🍳 Think of training a neural network like learning to cook:
Training Set = Your cookbook recipes
You practice and learn techniques from these

Validation Set = Taste testing while cooking
You adjust seasoning and cooking time based on feedback

Test Set = The dinner party
The final test where guests judge your cooking skills!
Key Rule: Never let your model see the test set during training - that's cheating!

Data Split Visualization

1️⃣
Train on Training Set
Model learns patterns and adjusts weights
2️⃣
Check on Validation Set
Monitor performance, adjust hyperparameters
3️⃣
Final Test Evaluation
One-time assessment of model performance
⚠️ Data Leakage Warning!
Using test data to make decisions about your model (like choosing hyperparameters) leads to overly optimistic performance estimates. Keep it locked away until the very end!
Prepared by Dr. Gorkem Kar