CS5720 - Week 5
Slide 96 of 100

Fine-tuning vs Feature Extraction

Transfer learning offers two main strategies: Feature Extraction uses pre-trained features as fixed feature extractors, while Fine-tuning adapts the entire network to your specific task.
🔒
Feature Extraction
Frozen Features + New Classifier
Freeze the convolutional base and only train a new classifier on top. The pre-trained features act as a fixed feature extractor.
⚡ Fast Training
Only trains the classifier, significantly faster
💾 Low Memory
Requires less GPU memory and computation
📊 Small Datasets
Works well with limited training data
🎯 Similar Domains
Best when target task is similar to ImageNet
🔧
Fine-tuning
Adapt Entire Network
Unfreeze some or all layers of the pre-trained network and train the entire model with a very low learning rate.
🎯 Higher Accuracy
Can achieve better performance on target task
🔄 Adaptable
Network adapts to your specific domain
📈 Large Datasets
Benefits from more training data
⏱️ Slower Training
Requires more time and computational resources

Decision Flowchart: Which Approach to Choose?

How much training data do you have?
< 1,000 images
Consider Feature Extraction
> 10,000 images
Consider Fine-tuning
How similar is your task to ImageNet?
Very Similar
Feature extraction often sufficient
Very Different
Fine-tuning usually better
What's your computational budget?
Limited Resources
Feature extraction is faster
Ample Resources
Fine-tuning for best results

Implementation Examples

# Feature Extraction Approach import torch import torch.nn as nn import torchvision.models as models # Load pre-trained ResNet model = models.resnet50(pretrained=True) # Freeze all layers for param in model.parameters(): param.requires_grad = False # Replace classifier (these parameters will be trained) num_classes = 10 model.fc = nn.Linear(model.fc.in_features, num_classes) # Only optimize the classifier parameters optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001) # Training loop - only classifier weights update for epoch in range(num_epochs): for batch_idx, (data, target) in enumerate(train_loader): optimizer.zero_grad() output = model(data) loss = criterion(output, target) loss.backward() # Only classifier gradients computed optimizer.step()
Prepared by Dr. Gorkem Kar