CS5720 - Week 11
Slide 213 of 220

Attention Mechanism in NLP

What is Attention?

Attention allows models to focus on different parts of the input when producing each part of the output, similar to how humans selectively focus on relevant information.
Why Attention Matters:

• Captures long-range dependencies
• Provides interpretability - see what the model focuses on
• Enables parallel computation
• Forms the foundation of transformers
💡 Key Insight
Attention mechanisms solve the information bottleneck problem in sequence-to-sequence models by allowing direct connections between all input and output positions.

Types of Attention

  • 🔄
    Self-Attention
    Relates different positions within the same sequence
  • ↔️
    Cross-Attention
    Relates positions between two different sequences
  • 🔀
    Multi-Head Attention
    Multiple attention mechanisms in parallel
Remember:
Attention weights tell us which parts of the input are most relevant for each output!

Interactive Attention Visualization

The cat sat on the mat
Click on any word to see attention weights
Prepared by Dr. Gorkem Kar