CS5720 - Week 10
Slide 184 of 200

YOLO: You Only Look Once

Revolutionary Approach

YOLO's Big Idea: Treat object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities.
Key Innovation:

Single Network Pass: No region proposals
Global Context: Sees entire image
Real-time Speed: 45+ FPS capability
End-to-end Training: Unified optimization
S×S
Grid
Each
Cell
Predicts
B Boxes
+ Class
7×7
Grid
2 Boxes
Per Cell
20 Classes
(PASCAL)
Output
7×7×30
Tensor
Single
Forward
Pass
Real-time
Speed
  • Extremely Fast
    Real-time processing at 45+ FPS
  • 🌍 Global Context
    Less background false positives
  • 🎯 Simple Architecture
    Easy to understand and implement

How YOLO Works

Three-Step Process
1. Divide image into S×S grid
2. Each cell predicts B bounding boxes
3. Apply NMS to final detections
📐 Architecture Details
• 24 convolutional layers + 2 fully connected
• Inspired by GoogLeNet architecture
• 1×1 reduction layers followed by 3×3 conv
• Final output: 7×7×30 tensor
⚠️ YOLO Limitations
• Struggles with small objects
• Limited to 2 objects per grid cell
• Lower localization accuracy
• Difficulty with unusual aspect ratios
Perfect For:
Real-time applications where speed is more important than perfect accuracy!

YOLO Evolution Timeline

YOLO v1
2015
• Pioneered unified detection
• 7×7 grid, 2 boxes per cell
• 45 FPS on Titan X GPU
mAP: 63.4%
YOLO v2
2016
• Batch normalization
• Anchor boxes
• Multi-scale training
mAP: 78.6%
YOLO v3
2018
• Multi-scale predictions
• Feature Pyramid Network
• Darknet-53 backbone
mAP: 83.0%
YOLO v4
2020
• CSPDarkNet53 backbone
• PANet neck
• Various optimizations
mAP: 65.7%
YOLO v5
2020
• PyTorch implementation
• AutoML optimizations
• Model scaling variants
mAP: 68.9%
YOLO v8
2023
• Anchor-free design
• Improved architecture
• Better training techniques
mAP: 72.3%
Prepared by Dr. Gorkem Kar