CS5720 - Week 9
Slide 178 of 180

Version Control for ML Projects

ML Version Control Challenges

  • 📦
    Large Files Problem
    Models and datasets are too large for standard Git
  • 🔬
    Experiment Tracking
    Hard to track which code produced which results
  • 🔄
    Reproducibility Issues
    Different environments, random seeds, data versions
  • 👥
    Team Collaboration
    Sharing models, data splits, and experiment results

Modern ML Version Control

  • 🗂️
    Git LFS
    Large File Storage for models and datasets
  • 📊
    DVC (Data Version Control)
    Version control for data, models, and pipelines
  • 📈
    MLflow Tracking
    Experiment tracking and model registry
  • 🐳
    Docker Containers
    Environment reproducibility and deployment

ML Project Version Control Workflow

💻
Code
Git for source code, configs, and notebooks
📂
Data
DVC for datasets, features, and preprocessing
🧠
Models
MLflow for model artifacts and metrics
🔬
Experiments
Wandb or TensorBoard for experiment tracking
Click on any step to learn more about the tools and best practices!
Prepared by Dr. Gorkem Kar