CS5720 - MobileNet: Efficient CNNs for Mobile

The Mobile Challenge

📱 Mobile Computing Constraints

Mobile devices have limited computational resources, battery life, and memory - yet users expect real-time AI performance.

⚡ Limited computational power (FLOPS)
💾 Restricted memory and storage
🔋 Battery life considerations
⏱️ Real-time latency requirements
📡 Network bandwidth limitations

MobileNet's Solution

🚀 Depthwise Separable Convolutions

MobileNet's key innovation: replace standard convolutions with depthwise separable convolutions for 8-9x computational reduction.

The Breakthrough:
• Factorize convolution into two simpler operations
• Depthwise: Apply one filter per input channel
• Pointwise: Combine channels with 1×1 convolution
• Dramatic reduction in parameters and computations

🏗️ Architecture Design

Tunable architecture with width multiplier α and resolution multiplier ρ for different performance/accuracy trade-offs.

Depthwise Separable Convolution Breakdown

Standard Convolution

Input: H×W×C

Apply K filters of size F×F×C

Output: H'×W'×K

Cost: H'×W'×K×F×F×C

Expensive: All channels processed together

Depthwise Separable

Step 1: Depthwise (F×F×1 per channel)

Step 2: Pointwise (1×1×C×K)

Output: H'×W'×K

Cost: H'×W'×C×F×F + H'×W'×C×K

Efficient: Separate spatial and channel processing

8-9×

Computation Reduction

Fewer FLOPS for same accuracy

4.2M

Parameters

vs 25.6M in ResNet-50

22ms

Mobile Latency

Real-time inference on phones

MobileNet Architecture Layers

Type

Configuration

Output

Operations

Conv

3×3×3×32 stride 2

112×112×32

Standard

DW Conv

3×3×32 dw

112×112×32

Depthwise

PW Conv

1×1×32×64

112×112×64

Pointwise

DW+PW

3×3×64 dw + 1×1×64×128

56×56×128

Separable

...

Repeat pattern with stride

...

🎛️ Width Multiplier α - Tune Performance vs Accuracy

α = 1.0

Baseline MobileNet

α = 0.75

25% fewer channels

α = 0.5

50% fewer channels

α = 0.25

Ultra-light version

MobileNet: Efficient CNNs for Mobile

The Mobile Challenge

MobileNet's Solution

Depthwise Separable Convolution Breakdown

Modal Title