CS5720 - YOLO: You Only Look Once

Revolutionary Approach

YOLO's Big Idea: Treat object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities.

Key Innovation:

• Single Network Pass: No region proposals
• Global Context: Sees entire image
• Real-time Speed: 45+ FPS capability
• End-to-end Training: Unified optimization

S×S

Grid

Each

Cell

Predicts

B Boxes

+ Class

7×7

Grid

2 Boxes

Per Cell

20 Classes

(PASCAL)

Output

7×7×30

Tensor

Single

Forward

Pass

Real-time

Speed

⚡ Extremely Fast

Real-time processing at 45+ FPS
🌍 Global Context

Less background false positives
🎯 Simple Architecture

Easy to understand and implement

How YOLO Works

Three-Step Process

1. Divide image into S×S grid
2. Each cell predicts B bounding boxes
3. Apply NMS to final detections

📐 Architecture Details

• 24 convolutional layers + 2 fully connected
• Inspired by GoogLeNet architecture
• 1×1 reduction layers followed by 3×3 conv
• Final output: 7×7×30 tensor

⚠️ YOLO Limitations

• Struggles with small objects
• Limited to 2 objects per grid cell
• Lower localization accuracy
• Difficulty with unusual aspect ratios

Perfect For:

Real-time applications where speed is more important than perfect accuracy!

YOLO Evolution Timeline

YOLO v1

2015

• Pioneered unified detection
• 7×7 grid, 2 boxes per cell
• 45 FPS on Titan X GPU

mAP: 63.4%

YOLO v2

2016

• Batch normalization
• Anchor boxes
• Multi-scale training

mAP: 78.6%

YOLO v3

2018

• Multi-scale predictions
• Feature Pyramid Network
• Darknet-53 backbone

mAP: 83.0%

YOLO v4

2020

• CSPDarkNet53 backbone
• PANet neck
• Various optimizations

mAP: 65.7%

YOLO v5

2020

• PyTorch implementation
• AutoML optimizations
• Model scaling variants

mAP: 68.9%

YOLO v8

2023

• Anchor-free design
• Improved architecture
• Better training techniques

mAP: 72.3%

YOLO: You Only Look Once

Revolutionary Approach

How YOLO Works

YOLO Evolution Timeline

Modal Title