R-CNN Approach: First generate object proposals (regions of interest), then classify each region and refine its location. This two-stage process achieves high accuracy but at the cost of speed.
Core Philosophy:
• Region Proposals: Find potential object locations
• Feature Extraction: CNN features for each region
• Classification: What object is in each region?
• Bounding Box Regression: Refine location
Key Advantage:
Higher accuracy than YOLO, especially for challenging cases with small objects or complex scenes.
R-CNN Evolution
2014
R-CNN
Regional CNN - The original two-stage detector
2015
Fast R-CNN
RoI pooling makes training and testing faster
2016
Faster R-CNN
RPN network generates proposals end-to-end
2017
Mask R-CNN
Adds instance segmentation capabilities
Trade-off:
Better accuracy but slower inference - typically 5-10 FPS vs YOLO's 45+ FPS.
R-CNN Family Architectures
🔍
R-CNN
• Selective Search proposals
• CNN feature extraction
• SVM classification
• Linear regression for boxes
Speed: ~0.02 FPS mAP: 66.0%
⚡
Fast R-CNN
• RoI pooling layer
• End-to-end training
• Multi-task loss
• Shared CNN features