BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model that understands context from both directions, revolutionizing NLP tasks.
Key Characteristics:
• Bidirectional - Reads text in both directions
• Pre-trained - Learns from massive unlabeled text
• Transfer Learning - Fine-tune for specific tasks
• Contextual - Word meanings depend on context
Published:
October 2018 by Google AI Language
Achieved SOTA on 11 NLP tasks!
Key Innovations
↔️
True Bidirectionality
Unlike GPT's left-to-right, BERT sees full context
🎭
Masked Language Modeling
Predict randomly masked words using context
🔗
Next Sentence Prediction
Learn relationships between sentences
🏗️
Encoder-Only Architecture
Uses only transformer encoder stack
BERT's Pre-training Tasks
Masked Language Model (MLM)
Randomly mask 15% of tokens and predict them
The cat [MASK] on the mat
→ Predict: "sat"
Next Sentence Prediction (NSP)
Predict if sentence B follows sentence A
A: The weather is nice today.
B: Let's go for a walk.
→ Predict: IsNext (True)