⚙️ Training

Fine-tuning LLMs

Adapting pre-trained LLMs to specific tasks: instruction tuning, RLHF, LoRA, and QLoRA.

Fine-tuning LLMs

Fine-tuning adapts a pre-trained model to a specific task, format, or behavior.

Why Fine-tune?

Pre-trained models predict next tokens. Fine-tuning teaches them to:

  • Follow instructions
  • Adopt a specific persona
  • Perform structured tasks (JSON output, code, etc.)

Supervised Fine-tuning (SFT)

Standard next-token prediction, but on a curated dataset of instruction-response pairs.

RLHF

Reinforcement Learning from Human Feedback:

  1. Collect human preference data (response A vs B)
  2. Train a reward model on these preferences
  3. Use PPO to optimize the policy against the reward model

LoRA

Low-Rank Adaptation adds trainable rank-decomposition matrices to frozen weight matrices:

W' = W + BA,  where B ∈ ℝ^(d×r), A ∈ ℝ^(r×k), r ≪ min(d,k)

Trains only ~0.1% of parameters while matching full fine-tuning quality.

QLoRA

QLoRA combines 4-bit quantization with LoRA. Enables fine-tuning 65B models on a single 48GB GPU.

References

  • Hu et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models
  • Dettmers et al. (2023). QLoRA: Efficient Finetuning of Quantized LLMs