CS5720 - Attention Mechanism in NLP

What is Attention?

Attention allows models to focus on different parts of the input when producing each part of the output, similar to how humans selectively focus on relevant information.

Why Attention Matters:

• Captures long-range dependencies
• Provides interpretability - see what the model focuses on
• Enables parallel computation
• Forms the foundation of transformers

💡 Key Insight

Attention mechanisms solve the information bottleneck problem in sequence-to-sequence models by allowing direct connections between all input and output positions.

Types of Attention

🔄
Self-Attention

Relates different positions within the same sequence
↔️
Cross-Attention

Relates positions between two different sequences
🔀
Multi-Head Attention

Multiple attention mechanisms in parallel

Remember:

Attention weights tell us which parts of the input are most relevant for each output!

Attention Mechanism in NLP

What is Attention?

Types of Attention

Interactive Attention Visualization

Modal Title