CS5720 - Week 4
Slide 73 of 80

ReLU Activation in CNNs

The ReLU Function

f(x) = max(0, x)
Rectified Linear Unit:
Returns input if positive, zero otherwise
Simple but Powerful:

Linear for positive inputs: f'(x) = 1 when x > 0
Zero for negative inputs: f'(x) = 0 when x ≤ 0
Non-saturating: No upper bound on activations
Sparse activation: Many neurons output zero
Dying ReLU Problem:
If a neuron gets stuck outputting negative values, it will always output 0 and stop learning (gradient = 0).

Why ReLU Works Well

⚡ Computational Efficiency
Simple max(0,x) operation, much faster than sigmoid/tanh
📈 Solves Vanishing Gradients
Gradient is 1 for positive inputs, enabling deep network training
🎯 Induces Sparsity
Creates sparse representations, improving interpretability
🧠 Biological Plausibility
Similar to how real neurons fire (all-or-nothing response)

Interactive Activation Function Comparison

Activation Functions

Compare Different Activations

ReLU
Leaky ReLU
Sigmoid
Tanh
Swish
GELU
Sample Activations
Negative Input
-2.0
0.0
Zero Input
0.0
0.0
Positive Input
3.0
3.0
Prepared by Dr. Gorkem Kar