| Function | Range | Advantages | Disadvantages | Best Use Case |
|---|---|---|---|---|
| Sigmoid | (0, 1) | Probability interpretation | Vanishing gradient | Binary classification output |
| ReLU | [0, ∞) | Fast, no vanishing gradient | Dead neurons | Hidden layers (default) |
| Tanh | (-1, 1) | Zero-centered | Vanishing gradient | RNN/LSTM gates |
| Leaky ReLU | (-∞, ∞) | No dead neurons | Not zero-centered | Deep networks |
| Softmax | (0, 1) per class | Multi-class probability | Computationally expensive | Multi-class output |
Start with ReLU, experiment with variants if needed
Match activation to your task type
Proper weight init for each activation
Monitor activations during training