CS5720 - ReLU Activation in CNNs

The ReLU Function

f(x) = max(0, x)

Rectified Linear Unit:
Returns input if positive, zero otherwise

Simple but Powerful:

• Linear for positive inputs: f'(x) = 1 when x > 0
• Zero for negative inputs: f'(x) = 0 when x ≤ 0
• Non-saturating: No upper bound on activations
• Sparse activation: Many neurons output zero

Dying ReLU Problem:

If a neuron gets stuck outputting negative values, it will always output 0 and stop learning (gradient = 0).

Why ReLU Works Well

⚡ Computational Efficiency

Simple max(0,x) operation, much faster than sigmoid/tanh

📈 Solves Vanishing Gradients

Gradient is 1 for positive inputs, enabling deep network training

🎯 Induces Sparsity

Creates sparse representations, improving interpretability

🧠 Biological Plausibility

Similar to how real neurons fire (all-or-nothing response)

Interactive Activation Function Comparison

Activation Functions

Compare Different Activations

ReLU

Leaky ReLU

Sigmoid

Tanh

Swish

GELU

Sample Activations

Negative Input

-2.0

→

0.0

Zero Input

0.0

→

0.0

Positive Input

3.0

→

3.0

ReLU Activation in CNNs

The ReLU Function

Why ReLU Works Well

Interactive Activation Function Comparison

Compare Different Activations

Modal Title