🧠 Fundamentals

Backpropagation

The algorithm behind training neural networks — from computing gradients via the chain rule to updating weights with gradient descent.

Backpropagation

Backpropagation is the algorithm used to compute gradients in neural networks efficiently.

Chain Rule

Given a composition of functions L = f(g(x)), the chain rule gives:

dL/dx = (dL/df) · (df/dg) · (dg/dx)

Backprop applies this recursively from the output layer back to the input.

Forward Pass

During the forward pass, activations are computed and cached at each layer:

z = Wx + b
a = σ(z)

Backward Pass

Gradients flow backward through the network:

δL/δW = δL/δa · δa/δz · δz/δW

Gradient Descent Update

Once gradients are computed, weights are updated:

θ = θ - α · ∇_θ L

where α is the learning rate.

Common Issues

  • Vanishing gradients: gradients shrink exponentially in deep nets. Solved by ReLU, residual connections, batch norm.
  • Exploding gradients: gradients grow unboundedly. Solved by gradient clipping.