CS5720 - BERT: Bidirectional Encoder Representations

BERT

What is BERT?

                        BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model that understands context from both directions, revolutionizing NLP tasks.
                    

Key Characteristics:

• Bidirectional - Reads text in both directions
• Pre-trained - Learns from massive unlabeled text
• Transfer Learning - Fine-tune for specific tasks
• Contextual - Word meanings depend on context

Published:

October 2018 by Google AI Language
Achieved SOTA on 11 NLP tasks!

Key Innovations

↔️
True Bidirectionality

Unlike GPT's left-to-right, BERT sees full context
🎭
Masked Language Modeling

Predict randomly masked words using context
🔗
Next Sentence Prediction

Learn relationships between sentences
🏗️
Encoder-Only Architecture

Uses only transformer encoder stack

BERT's Pre-training Tasks

Masked Language Model (MLM)

Randomly mask 15% of tokens and predict them

The cat [MASK] on the mat
→ Predict: "sat"

Next Sentence Prediction (NSP)

Predict if sentence B follows sentence A

A: The weather is nice today.
B: Let's go for a walk.
→ Predict: IsNext (True)

Common BERT Applications

💬

Question Answering

😊

Sentiment Analysis

🏷️

Named Entity Recognition

📊

Text Classification

🔍

Semantic Search

📝

Text Summarization