Object detection combines classification and localization to identify what objects are in an image and where they are located using bounding boxes.
Key Components:
• Classification: What is the object?
• Localization: Where is the object?
• Multiple Objects: Handle variable numbers
• Confidence Scores: How sure are we?
🔍
Variable Object Count
Images can contain 0 to 100+ objects
📏
Scale Variation
Objects can be tiny or massive
👁️
Occlusion Handling
Objects may be partially hidden
Classification vs Detection
Image Classification
🐱
Input: Image Output: Single label Answer: "Cat"
Object Detection
📦
Input: Image Output: Boxes + labels Answer: "Cat at (x,y,w,h)"
Key Insight:
Detection is much harder because we need to search the entire image at multiple scales and locations!
Object Detection in Action
Street Scene with Objects
Car 95%
Person 89%
Bike 76%
Output: Multiple bounding boxes with class labels and confidence scores
Bounding Box
(x, y, width, height) coordinates that tightly enclose the object
Class Label
The predicted category of the detected object
Confidence Score
How certain the model is about this detection (0-100%)