Transfer learning offers two main strategies: Feature Extraction uses pre-trained features as fixed feature extractors, while Fine-tuning adapts the entire network to your specific task.
🔒
Feature Extraction
Frozen Features + New Classifier
Freeze the convolutional base and only train a new classifier on top. The pre-trained features act as a fixed feature extractor.
⚡ Fast Training
Only trains the classifier, significantly faster
💾 Low Memory
Requires less GPU memory and computation
📊 Small Datasets
Works well with limited training data
🎯 Similar Domains
Best when target task is similar to ImageNet
🔧
Fine-tuning
Adapt Entire Network
Unfreeze some or all layers of the pre-trained network and train the entire model with a very low learning rate.
🎯 Higher Accuracy
Can achieve better performance on target task
🔄 Adaptable
Network adapts to your specific domain
📈 Large Datasets
Benefits from more training data
⏱️ Slower Training
Requires more time and computational resources
Decision Flowchart: Which Approach to Choose?
How much training data do you have?
< 1,000 images
Consider Feature Extraction
> 10,000 images
Consider Fine-tuning
How similar is your task to ImageNet?
Very Similar
Feature extraction often sufficient
Very Different
Fine-tuning usually better
What's your computational budget?
Limited Resources
Feature extraction is faster
Ample Resources
Fine-tuning for best results
Implementation Examples
# Feature Extraction Approach
import torch
import torch.nn as nn
import torchvision.models as models
# Load pre-trained ResNet
model = models.resnet50(pretrained=True)
# Freeze all layers
for param in model.parameters():
param.requires_grad = False
# Replace classifier (these parameters will be trained)
num_classes = 10
model.fc = nn.Linear(model.fc.in_features, num_classes)
# Only optimize the classifier parameters
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001)
# Training loop - only classifier weights update
for epoch in range(num_epochs):
for batch_idx, (data, target) in enumerate(train_loader):
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward() # Only classifier gradients computed
optimizer.step()