Fine-tuning vs Feature Extraction

Transfer learning offers two main strategies: Feature Extraction uses pre-trained features as fixed feature extractors, while Fine-tuning adapts the entire network to your specific task.

🔒

Feature Extraction

Frozen Features + New Classifier

Freeze the convolutional base and only train a new classifier on top. The pre-trained features act as a fixed feature extractor.

⚡ Fast Training

Only trains the classifier, significantly faster

💾 Low Memory

Requires less GPU memory and computation

📊 Small Datasets

Works well with limited training data

🎯 Similar Domains

Best when target task is similar to ImageNet

🔧

Fine-tuning

Adapt Entire Network

Unfreeze some or all layers of the pre-trained network and train the entire model with a very low learning rate.

🎯 Higher Accuracy

Can achieve better performance on target task

🔄 Adaptable

Network adapts to your specific domain

📈 Large Datasets

Benefits from more training data

⏱️ Slower Training

Requires more time and computational resources

Decision Flowchart: Which Approach to Choose?

How much training data do you have?

< 1,000 images

Consider Feature Extraction

> 10,000 images

Consider Fine-tuning

How similar is your task to ImageNet?

Very Similar

Feature extraction often sufficient

Very Different

Fine-tuning usually better

What's your computational budget?

Limited Resources

Feature extraction is faster

Ample Resources

Fine-tuning for best results

Implementation Examples

# Feature Extraction Approach
import torch
import torch.nn as nn
import torchvision.models as models

# Load pre-trained ResNet
model = models.resnet50(pretrained=True)

# Freeze all layers
for param in model.parameters():
    param.requires_grad = False

# Replace classifier (these parameters will be trained)
num_classes = 10
model.fc = nn.Linear(model.fc.in_features, num_classes)

# Only optimize the classifier parameters
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001)

# Training loop - only classifier weights update
for epoch in range(num_epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()  # Only classifier gradients computed
        optimizer.step()
            

Prepared by Dr. Gorkem Kar

Fine-tuning vs Feature Extraction

Decision Flowchart: Which Approach to Choose?

Implementation Examples

Modal Title