CS5720 - Week 10
Slide 196 of 200

Data Collection and Annotation

The Foundation of Computer Vision

Data collection and annotation is the process of gathering raw visual data and systematically labeling it to create training datasets for computer vision models.
The Data Pipeline:
  • 1️⃣
    Data Collection
    Gather raw images/videos from various sources
  • 2️⃣
    Data Cleaning
    Remove duplicates, filter quality, standardize formats
  • 3️⃣
    Annotation
    Label data with bounding boxes, masks, or categories
  • 4️⃣
    Quality Control
    Validate annotations for consistency and accuracy
  • 5️⃣
    Dataset Split
    Divide into training, validation, and test sets

Key Challenges

  • 💰
    Cost & Time
    Manual annotation is expensive and time-consuming, especially for complex tasks like segmentation
  • 🎯
    Consistency
    Multiple annotators may interpret the same object differently, leading to inconsistent labels
  • 🔍
    Quality vs Speed
    Balancing annotation accuracy with the need for large-scale datasets
  • 🌍
    Domain Gaps
    Training data may not represent real-world deployment conditions
  • ⚖️
    Class Imbalance
    Some classes are much rarer than others, affecting model performance

Popular Annotation Tools

🖼️
LabelImg
Open-source tool for creating bounding box annotations in YOLO and PASCAL VOC formats
Free • Object Detection • XML/TXT Output
🎨
CVAT
Computer Vision Annotation Tool supporting various tasks including segmentation and tracking
Web-based • Multi-format • Team Collaboration
🔧
Supervisely
Enterprise platform with AI-assisted annotation and data management capabilities
AI-assisted • Enterprise • Full Pipeline
📱
Roboflow
End-to-end computer vision platform with annotation, augmentation, and deployment tools
Cloud-based • Auto-augmentation • API Integration
🏷️
Label Studio
Multi-modal annotation tool supporting text, images, audio, video, and time series data
Multi-modal • ML Integration • Open Source
☁️
Amazon SageMaker
AWS service for data labeling with built-in workflows for common CV tasks
Cloud Service • Auto-labeling • Quality Control

Interactive Annotation Simulator

🖼️
Sample Image for Annotation
Click annotation tools to simulate labeling
0/10 objects annotated
Prepared by Dr. Gorkem Kar