GPT (Generative Pre-trained Transformer) is a family of autoregressive language models that generate human-like text by predicting the next word in a sequence.
Core Principles:
• Autoregressive - Generates text left-to-right
• Unsupervised Pre-training - Learns from raw text
• Decoder-only - Uses only transformer decoder
• Few-shot Learning - Adapts with minimal examples
Key Innovation:
GPT showed that language models pre-trained on large amounts of text can be fine-tuned for various tasks with minimal supervision.
GPT Evolution
2018 - GPT-1
Unsupervised Pre-training
117M parameters, proved pre-training works
2019 - GPT-2
Zero-shot Task Transfer
1.5B parameters, "too dangerous to release"
2020 - GPT-3
Few-shot Learning Revolution
175B parameters, in-context learning
2023 - GPT-4
Multimodal Understanding
Unknown size, vision + text capabilities
GPT Text Generation Demo
Prompt: "Once upon a time, in a small village nestled between mountains,"