CS5720 - Week 13
Slide 247 of 260
Adversarial Training and Defense
Adversarial Attack Types
Adversarial attacks
are carefully crafted inputs designed to fool neural networks into making incorrect predictions, often imperceptible to humans.
ā”
FGSM Attack
Fast Gradient Sign Method - single-step attacks using gradients
šÆ
PGD Attack
Projected Gradient Descent - iterative optimization attacks
š§
C&W Attack
Carlini & Wagner - sophisticated optimization-based attacks
ā ļø Attack Impact:
Even tiny perturbations can cause 99%+ accurate models to fail catastrophically on adversarial examples.
Defense Strategies
š”ļø
Adversarial Training
Train on both clean and adversarial examples
š
Attack Detection
Identify adversarial inputs before processing
ā
Certified Defense
Mathematical guarantees against certain attacks
š
Input Preprocessing
Transform inputs to remove adversarial perturbations
Adversarial Training Process
1
Generate Attacks
Create adversarial examples using attack algorithms
2
Augment Dataset
Mix clean and adversarial examples in training data
3
Robust Training
Train model to handle both types of inputs
4
Robustness Testing
Evaluate against various attack methods
ā Previous
Next ā
Prepared by Dr. Gorkem Kar
Modal Title
Ć
Modal content goes here...