CS5720 - Activation Functions Introduction

What are Activation Functions?

Activation functions introduce non-linearity into neural networks, transforming the weighted sum of inputs into an output signal. Without them, even deep networks would only compute linear transformations!

Neuron with Activation:

                        Input → [Σ(w·x) + b] → f(z) → Output
                    

Where f(z) is the activation function

Key Properties:

Non-linear transformation
Differentiable (for backpropagation)
Computationally efficient
Suitable gradient properties

Why We Need Them

Without Activation Functions:

f(W₂(W₁x)) = W₂W₁x = Wx

Multiple layers collapse to a single linear transformation!

With Activation Functions:

f₂(W₂·f₁(W₁x))

Can learn complex, non-linear patterns!

Benefits:

Enable learning of complex patterns
Create non-linear decision boundaries
Allow deep networks to be meaningful
Model real-world relationships

Quick Comparison

Linear

f(z) = z

No activation

❌ No non-linearity

Sigmoid

f(z) = 1/(1+e⁻ᶻ)

Range: (0, 1)

✓ Probability output

Tanh

f(z) = tanh(z)

Range: (-1, 1)

✓ Zero-centered

ReLU

f(z) = max(0, z)

Range: [0, ∞)

✓ Most popular

Activation Functions Introduction