Rectified Linear Unit:
Returns input if positive, zero otherwise
Simple but Powerful:
• Linear for positive inputs: f'(x) = 1 when x > 0
• Zero for negative inputs: f'(x) = 0 when x ≤ 0
• Non-saturating: No upper bound on activations
• Sparse activation: Many neurons output zero
Dying ReLU Problem:
If a neuron gets stuck outputting negative values, it will always output 0 and stop learning (gradient = 0).
Why ReLU Works Well
⚡ Computational Efficiency
Simple max(0,x) operation, much faster than sigmoid/tanh
📈 Solves Vanishing Gradients
Gradient is 1 for positive inputs, enabling deep network training