Backpropagation
The algorithm behind training neural networks — from computing gradients via the chain rule to updating weights with gradient descent.
Backpropagation
Backpropagation is the algorithm used to compute gradients in neural networks efficiently.
Chain Rule
Given a composition of functions L = f(g(x)), the chain rule gives:
dL/dx = (dL/df) · (df/dg) · (dg/dx)
Backprop applies this recursively from the output layer back to the input.
Forward Pass
During the forward pass, activations are computed and cached at each layer:
z = Wx + b
a = σ(z)
Backward Pass
Gradients flow backward through the network:
δL/δW = δL/δa · δa/δz · δz/δW
Gradient Descent Update
Once gradients are computed, weights are updated:
θ = θ - α · ∇_θ L
where α is the learning rate.
Common Issues
- Vanishing gradients: gradients shrink exponentially in deep nets. Solved by ReLU, residual connections, batch norm.
- Exploding gradients: gradients grow unboundedly. Solved by gradient clipping.