CS5720 - Week 11
Slide 216 of 220

GPT: Generative Pre-trained Transformer

What is GPT?

GPT (Generative Pre-trained Transformer) is a family of autoregressive language models that generate human-like text by predicting the next word in a sequence.
Core Principles:

Autoregressive - Generates text left-to-right
Unsupervised Pre-training - Learns from raw text
Decoder-only - Uses only transformer decoder
Few-shot Learning - Adapts with minimal examples
Key Innovation:
GPT showed that language models pre-trained on large amounts of text can be fine-tuned for various tasks with minimal supervision.

GPT Evolution

2018 - GPT-1
Unsupervised Pre-training
117M parameters, proved pre-training works
2019 - GPT-2
Zero-shot Task Transfer
1.5B parameters, "too dangerous to release"
2020 - GPT-3
Few-shot Learning Revolution
175B parameters, in-context learning
2023 - GPT-4
Multimodal Understanding
Unknown size, vision + text capabilities

GPT Text Generation Demo

Prompt: "Once upon a time, in a small village nestled between mountains,"

Key GPT Features

📚
In-Context Learning
Learn from examples in the prompt
📈
Scaling Laws
Performance improves with size
Emergent Abilities
New capabilities at scale
Prepared by Dr. Gorkem Kar