CS5720 - Week 5
Slide 89 of 100

MobileNet: Efficient CNNs for Mobile

The Mobile Challenge

📱 Mobile Computing Constraints
Mobile devices have limited computational resources, battery life, and memory - yet users expect real-time AI performance.
  • Limited computational power (FLOPS)
  • 💾 Restricted memory and storage
  • 🔋 Battery life considerations
  • ⏱️ Real-time latency requirements
  • 📡 Network bandwidth limitations

MobileNet's Solution

🚀 Depthwise Separable Convolutions
MobileNet's key innovation: replace standard convolutions with depthwise separable convolutions for 8-9x computational reduction.
The Breakthrough:
• Factorize convolution into two simpler operations
• Depthwise: Apply one filter per input channel
• Pointwise: Combine channels with 1×1 convolution
• Dramatic reduction in parameters and computations
🏗️ Architecture Design
Tunable architecture with width multiplier α and resolution multiplier ρ for different performance/accuracy trade-offs.

Depthwise Separable Convolution Breakdown

Standard Convolution
Input: H×W×C
Apply K filters of size F×F×C
Output: H'×W'×K
Cost: H'×W'×K×F×F×C
Expensive: All channels processed together
Depthwise Separable
Step 1: Depthwise (F×F×1 per channel)
Step 2: Pointwise (1×1×C×K)
Output: H'×W'×K
Cost: H'×W'×C×F×F + H'×W'×C×K
Efficient: Separate spatial and channel processing
8-9×
Computation Reduction
Fewer FLOPS for same accuracy
4.2M
Parameters
vs 25.6M in ResNet-50
22ms
Mobile Latency
Real-time inference on phones
MobileNet Architecture Layers
Type
Configuration
Output
Operations
Conv
3×3×3×32 stride 2
112×112×32
Standard
DW Conv
3×3×32 dw
112×112×32
Depthwise
PW Conv
1×1×32×64
112×112×64
Pointwise
DW+PW
3×3×64 dw + 1×1×64×128
56×56×128
Separable
...
Repeat pattern with stride
...
...
🎛️ Width Multiplier α - Tune Performance vs Accuracy
α = 1.0
Baseline MobileNet
α = 0.75
25% fewer channels
α = 0.5
50% fewer channels
α = 0.25
Ultra-light version
Prepared by Dr. Gorkem Kar