CS5720 - Week 13
Slide 247 of 260

Adversarial Training and Defense

Adversarial Attack Types

Adversarial attacks are carefully crafted inputs designed to fool neural networks into making incorrect predictions, often imperceptible to humans.
  • ⚔
    FGSM Attack
    Fast Gradient Sign Method - single-step attacks using gradients
  • šŸŽÆ
    PGD Attack
    Projected Gradient Descent - iterative optimization attacks
  • šŸ”§
    C&W Attack
    Carlini & Wagner - sophisticated optimization-based attacks
āš ļø Attack Impact:
Even tiny perturbations can cause 99%+ accurate models to fail catastrophically on adversarial examples.

Defense Strategies

  • šŸ›”ļø
    Adversarial Training
    Train on both clean and adversarial examples
  • šŸ”
    Attack Detection
    Identify adversarial inputs before processing
  • āœ…
    Certified Defense
    Mathematical guarantees against certain attacks
  • šŸ”„
    Input Preprocessing
    Transform inputs to remove adversarial perturbations

Adversarial Training Process

1
Generate Attacks
Create adversarial examples using attack algorithms
2
Augment Dataset
Mix clean and adversarial examples in training data
3
Robust Training
Train model to handle both types of inputs
4
Robustness Testing
Evaluate against various attack methods
Prepared by Dr. Gorkem Kar