CS5720 - VGGNet: Going Deeper with Small Filters

The VGG Philosophy

"What if we only used 3×3 convolutions throughout the entire network?"

This simple yet powerful idea led to VGGNet's elegant and effective architecture.

Key Design Principles:
• Use only 3×3 convolutional filters
• Stack many layers to go deeper
• Double filters after each pooling
• Keep architecture extremely uniform
• Simple is better than complex

The Result:

2nd place in ILSVRC 2014 classification, but became more influential than the winner due to its simplicity and effectiveness

VGG Block Structure

Block 1: 64 filters

2 × Conv3×3-64 → MaxPool
Block 2: 128 filters

2 × Conv3×3-128 → MaxPool
Block 3: 256 filters

3 × Conv3×3-256 → MaxPool
Block 4: 512 filters

3 × Conv3×3-512 → MaxPool
Block 5: 512 filters

3 × Conv3×3-512 → MaxPool
Fully Connected

FC-4096 → FC-4096 → FC-1000

Why 3×3 Filters Are Brilliant

7×7

One 7×7 Conv

49 parameters
1 non-linearity

5×5

Two 5×5 Conv

50 parameters
2 non-linearities

3×3

Three 3×3 Conv

27 parameters
3 non-linearities

VGG Variants

VGG-11

8 conv + 3 FC
133M params

VGG-13

10 conv + 3 FC
133M params

VGG-16

13 conv + 3 FC
138M params

VGG-19

16 conv + 3 FC
144M params

VGGNet: Going Deeper with Small Filters

The VGG Philosophy

VGG Block Structure

Why 3×3 Filters Are Brilliant

Modal Title