CS5720 - Object Detection Introduction

What is Object Detection?

Object detection combines classification and localization to identify what objects are in an image and where they are located using bounding boxes.

Key Components:

• Classification: What is the object?
• Localization: Where is the object?
• Multiple Objects: Handle variable numbers
• Confidence Scores: How sure are we?

🔍 Variable Object Count

Images can contain 0 to 100+ objects
📏 Scale Variation

Objects can be tiny or massive
👁️ Occlusion Handling

Objects may be partially hidden

Classification vs Detection

Image Classification

🐱

Input: Image
Output: Single label
Answer: "Cat"

Object Detection

📦

Input: Image
Output: Boxes + labels
Answer: "Cat at (x,y,w,h)"

Key Insight:

Detection is much harder because we need to search the entire image at multiple scales and locations!

Object Detection in Action

Street Scene with Objects

Car 95%

Person 89%

Bike 76%

Output: Multiple bounding boxes with class labels and confidence scores

Bounding Box

(x, y, width, height) coordinates that tightly enclose the object

Class Label

The predicted category of the detected object

Confidence Score

How certain the model is about this detection (0-100%)

Object Detection Introduction

What is Object Detection?

Classification vs Detection

Object Detection in Action

Modal Title