CS5720 - Week 11
Slide 205 of 220

Bag of Words vs Word Embeddings

🛍️

Bag of Words

Traditional approach that counts word occurrences, treating each word independently without considering relationships or order.

  • Simple & Interpretable
    Easy to understand and implement
  • Sparse Vectors
    High-dimensional, mostly zeros
  • Good Baseline
    Often performs surprisingly well
  • No Semantics
    Misses word relationships
🧠

Word Embeddings

Modern approach that learns dense vector representations capturing semantic relationships and contextual meanings between words.

  • Semantic Understanding
    Captures word relationships
  • Dense Vectors
    Low-dimensional, information-rich
  • Analogical Reasoning
    king - man + woman = queen
  • Transfer Learning
    Pre-trained models available

Side-by-Side Comparison

Bag of Words Example

Document: "I love cats and dogs"

Vocabulary: [I, love, cats, and, dogs, hate, birds] Vector: [1, 1, 1, 1, 1, 0, 0]

Click to see detailed breakdown

VS

Word Embeddings Example

Word: "cats"

cats → [0.2, -0.1, 0.8, 0.3, -0.5, ...] (300-dimensional dense vector) Semantically similar to: dogs, pets, animals, kittens

Click to explore relationships

🚀 The Evolution: From counting words to understanding meaning!

While BoW counts occurrences, embeddings capture the essence of language itself.

Prepared by Dr. Gorkem Kar