Fine-tuning LLMs

Fine-tuning adapts a pre-trained model to a specific task, format, or behavior.

Why Fine-tune?

Pre-trained models predict next tokens. Fine-tuning teaches them to:

Standard next-token prediction, but on a curated dataset of instruction-response pairs.

Reinforcement Learning from Human Feedback:

Low-Rank Adaptation adds trainable rank-decomposition matrices to frozen weight matrices:

W' = W + BA,  where B ∈ ℝ^(d×r), A ∈ ℝ^(r×k), r ≪ min(d,k)

Trains only ~0.1% of parameters while matching full fine-tuning quality.

QLoRA combines 4-bit quantization with LoRA. Enables fine-tuning 65B models on a single 48GB GPU.