CS5720 - Attention Mechanism Introduction

Why Attention?

Attention allows the model to focus on relevant parts of the input when producing each output

Problems Solved:

• Information bottleneck - No single vector limitation
• Long sequences - Direct connections help
• Alignment - Learn what to focus on
• Interpretability - See what model attends to

🧠 Human Analogy

Like focusing on specific words when translating

How It Works

🎯 Dynamic Context Vector

Different for each output position

🔍 Soft Alignment

Learn where to look automatically

🔄 End-to-End Learning

Attention weights learned during training

📊 Interpretable

Visualize what model focuses on

Attention in Action

The

cat

sat

mat

Generating: "Le chat"

attention(Q, K, V) = softmax(QK^T / √d_k)V

Note: Attention weights show the model focusing on "cat" when generating "chat"

Attention Mechanism Introduction

Why Attention?

How It Works

Attention in Action

Modal Title