CS5720 - Week 11
Slide 215 of 220

BERT: Bidirectional Encoder Representations

What is BERT?

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model that understands context from both directions, revolutionizing NLP tasks.
Key Characteristics:

Bidirectional - Reads text in both directions
Pre-trained - Learns from massive unlabeled text
Transfer Learning - Fine-tune for specific tasks
Contextual - Word meanings depend on context
Published:
October 2018 by Google AI Language
Achieved SOTA on 11 NLP tasks!

Key Innovations

  • ↔️
    True Bidirectionality
    Unlike GPT's left-to-right, BERT sees full context
  • 🎭
    Masked Language Modeling
    Predict randomly masked words using context
  • 🔗
    Next Sentence Prediction
    Learn relationships between sentences
  • 🏗️
    Encoder-Only Architecture
    Uses only transformer encoder stack

BERT's Pre-training Tasks

Masked Language Model (MLM)

Randomly mask 15% of tokens and predict them
The cat [MASK] on the mat
→ Predict: "sat"

Next Sentence Prediction (NSP)

Predict if sentence B follows sentence A
A: The weather is nice today.
B: Let's go for a walk.
→ Predict: IsNext (True)

Common BERT Applications

💬
Question Answering
😊
Sentiment Analysis
🏷️
Named Entity Recognition
📊
Text Classification
🔍
Semantic Search
📝
Text Summarization
Prepared by Dr. Gorkem Kar