Thursday, 3 July 2025

So - what is Preference Alignment in LLM Training?

Preference alignment in LLM training aims to improve an LLM's behavior by forcing it to follow rules and preferences.  It could related to stopping offensive language or some other restriction. 

Some approaches to preference alignment are detailed in this blog post from Miguel Mendez. There are a number of known techniques for this - these include:

PPO: Proximal Policy Optimization

DPO: Direct Preference Optimization

ORPO: Optimization without Reference Model

For preference alignment we usually need data which is good or bad.  Human annotation of such data is often expensive and in some cases a clear "winner" in terms of contrasting data points is not decidable. With KTO two answers can both be regarded as good. This arguably is closer to reality. 

KTO stands for Kahneman-Tversky Optimization and is detailed more in a blog post from contextual.ai.

The research paper on KTO should be read to understand how to construct the relevant KTO loss function.

No comments: