CS5720 - Week 7
Slide 138 of 140

Attention Mechanism Introduction

Why Attention?

Attention allows the model to focus on relevant parts of the input when producing each output
Problems Solved:

Information bottleneck - No single vector limitation
Long sequences - Direct connections help
Alignment - Learn what to focus on
Interpretability - See what model attends to
🧠 Human Analogy
Like focusing on specific words when translating

How It Works

🎯 Dynamic Context Vector
Different for each output position
🔍 Soft Alignment
Learn where to look automatically
🔄 End-to-End Learning
Attention weights learned during training
📊 Interpretable
Visualize what model focuses on

Attention in Action

The
cat
sat
on
mat
Generating: "Le chat"
attention(Q, K, V) = softmax(QK^T / √d_k)V
Note: Attention weights show the model focusing on "cat" when generating "chat"
Prepared by Dr. Gorkem Kar