CS5720 - Week 1
Slide 10 of 20

Multi-Layer Perceptrons (MLPs)

Beyond Single Layers

Multi-Layer Perceptrons (MLPs) add hidden layers between input and output, enabling them to learn non-linear patterns that single perceptrons cannot.
Key Innovation:

By stacking multiple layers and using non-linear activation functions, MLPs can approximate any continuous function - the Universal Approximation Theorem!

MLP Architecture:
  • Input Layer: Receives raw features
  • Hidden Layer(s): Learns representations
  • Output Layer: Produces final predictions
  • Full Connectivity: Each neuron connects to all neurons in next layer

Solving the XOR Problem

Remember how single perceptrons failed at XOR? MLPs solve it elegantly with just one hidden layer!
Feature Single Perceptron Multi-Layer Perceptron
Decision Boundary Linear only Non-linear
XOR Problem Cannot solve Easily solved
Hidden Layers 0 1 or more
Learning Algorithm Perceptron rule Backpropagation
Historical Note:

The inability to train MLPs held back neural networks until backpropagation was popularized in 1986!

MLP Architecture Visualization

x₁ x₂ x₃ h₁ h₂ h₃ h₄ h₅ h₆ y Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer
🏗️
Network Depth
Multiple layers create hierarchical feature learning
Activation Functions
Non-linear transformations enable complex patterns
🧠
Learning Capacity
More neurons = ability to learn more complex functions
🌐
Universal Approximation
Can approximate any continuous function

XOR Solution with MLP

How MLPs Solve XOR

  1. Hidden neurons learn features:
    • h₁: Detects when both inputs are 1
    • h₂: Detects when at least one input is 1
  2. Output combines features:
    XOR = (at least one is 1) AND NOT (both are 1)
  3. Non-linear boundaries:
    Creates curved decision regions
XOR Truth Table
x₁ x₂ y
0 0 0
0 1 1
1 0 1
1 1 0
Prepared by Dr. Gorkem Kar