Name		Name	Last commit message	Last commit date
parent directory ..
notebooks		notebooks
README.md		README.md
dpo.md		dpo.md
orpo.md		orpo.md

README.md

Preference Alignment

This module covers techniques for aligning language models with human preferences. While supervised fine-tuning helps models learn tasks, preference alignment encourages outputs to match human expectations and values.

Overview

Typical alignment methods involve multiple stages:

Supervised Fine-Tuning (SFT) to adapt models to specific domains
Preference alignment (like RLHF or DPO) to improve response quality

Alternative approaches like ORPO combine instruction tuning and preference alignment into a single process. Here, we will focus on DPO and ORPO algorithms.

If you would like to learn more about the different alignment techniques, you can read more about them in the Argilla Blog.

1️⃣ Direct Preference Optimization (DPO)

Direct Preference Optimization (DPO) simplifies preference alignment by directly optimizing models using preference data. This approach eliminates the need for separate reward models and complex reinforcement learning, making it more stable and efficient than traditional Reinforcement Learning from Human Feedback (RLHF). For more details, you can refer to the Direct Preference Optimization (DPO) documentation.

2️⃣ Odds Ratio Preference Optimization (ORPO)

ORPO introduces a combined approach to instruction tuning and preference alignment in a single process. It modifies the standard language modeling objective by combining negative log-likelihood loss with an odds ratio term on a token level. The approach features a unified single-stage training process, reference model-free architecture, and improved computational efficiency. ORPO has shown impressive results across various benchmarks, demonstrating better performance on AlpacaEval compared to traditional methods. For more details, you can refer to the Odds Ratio Preference Optimization (ORPO) documentation.

Exercise Notebooks

Title	Description	Exercise	Link	Colab
DPO Training	Learn how to train models using Direct Preference Optimization	🐢 Train a model using the Anthropic HH-RLHF dataset 🐕 Use your own preference dataset 🦁 Experiment with different preference datasets and model sizes	Notebook
ORPO Training	Learn how to train models using Odds Ratio Preference Optimization	🐢 Train a model using instruction and preference data 🐕 Experiment with different loss weightings 🦁 Compare ORPO results with DPO	Notebook

Resources

TRL Documentation - Documentation for the Transformers Reinforcement Learning (TRL) library, which implements various alignment techniques including DPO.
DPO Paper - Original research paper introducing Direct Preference Optimization as a simpler alternative to RLHF that directly optimizes language models using preference data.
ORPO Paper - Introduces Odds Ratio Preference Optimization, a novel approach that combines instruction tuning and preference alignment in a single training stage.
Argilla RLHF Guide - A guide explaining different alignment techniques including RLHF, DPO, and their practical implementations.
Blog post on DPO - Practical guide on implementing DPO using the TRL library with code examples and best practices.
TRL example script on DPO - Complete example script demonstrating how to implement DPO training using the TRL library.
TRL example script on ORPO - Reference implementation of ORPO training using the TRL library with detailed configuration options.
Hugging Face Alignment Handbook - Resource guides and codebase for aligning language models using various techniques including SFT, DPO, and RLHF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2_preference_alignment

2_preference_alignment

README.md

Preference Alignment

Overview

1️⃣ Direct Preference Optimization (DPO)

2️⃣ Odds Ratio Preference Optimization (ORPO)

Exercise Notebooks

Resources

Files

2_preference_alignment

Directory actions

More options

Directory actions

More options

Latest commit

History

2_preference_alignment

Folders and files

parent directory

README.md

Preference Alignment

Overview

1️⃣ Direct Preference Optimization (DPO)

2️⃣ Odds Ratio Preference Optimization (ORPO)

Exercise Notebooks

Resources