Release v0.14.0 · huggingface/trl

Major and breaking changes

👨‍👨‍👧‍👧 GRPO

by @qgallouedec in #2565

What's Changed

⚰️ Remove deprecated by @qgallouedec in #2485
🗣️ Improve prose for smol course by @burtenshaw in #2487
🤩 Add SmolVLM tutorials to Community Tutorials page by @sergiopaniego in #2498
🏞️ Proper dataset for documentation images by @qgallouedec in #2499
🗂️ Reorganize documentation by @qgallouedec in #2483
[ORPO] fix orpo chosen-nll loss by @kashif in #2502
🏚 Remove unused components by @qgallouedec in #2480
Update community_tutorials.md by @qgallouedec in #2509
❎ Remove RLOO example test by @qgallouedec in #2513
👨‍🍳 Clarify DPO data preparation by @qgallouedec in #2512
💧 Generalize disable_dropout by @qgallouedec in #2511
👬 Rename collator PreferenceCollator to DataCollatorForPreference by @qgallouedec in #2510
📦 Packing documentation by @qgallouedec in #2503
☄️ Update Comet integration to include LogCompletionsCallback and Trainer.evaluation_loop() by @yaricom in #2501
Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM by @Abhishek-TAMU in #2158
🚜 Use field in dataclasses by @qgallouedec in #2494
©️ Update copyrights year by @qgallouedec in #2547
🧑‍🤝‍🧑 Proper metrics gathering across ranks before logging by @zhc7 in #2474
✒️ Fix typo in formatting_func's documentation in ConstantLengthDataset by @SamuelLarkin in #2549
🕊️ DPO padding free by @qgallouedec in #2520
ℹ️ XPU support for DPO by @faaany in #2533
🔠 Fix SFT truncation documentation by @umbilnm in #2521
↩️ Revert ORPO loss changes by @kashif in #2527
🎴 Add readme for datasets by @August-murr in #2491
💔 Fix dataset type unpair conversion docs by @claralp in #2550
[RLOO] Reinforce++ by @kashif in #2552
🏛️ Improve DPO configuration documentation structure by @qgallouedec in #2561
✨ Refine model card method docstring by @qgallouedec in #2566
🪄 Minor comment style modif by @qgallouedec in #2582
🏎️ vllm for Online DPO by @qgallouedec in #2558
🔖 Issues Auto-Labeller by @August-murr in #2542
🐛 Simplify bug report template by @qgallouedec in #2585
[RLOO] fix token_level_kl by @kashif in #2575
✂️ Truncate by default by @qgallouedec in #2587
🫢 Add max_prompt_length parameter in tests by @qgallouedec in #2588
🎞️ Fix documentation SFT -max_seq_length instead of max_length by @skandermoalla in #2590
👨‍👨‍👧‍👧 GRPO by @qgallouedec in #2565
🫣 Ignore CLI test for Python 3.9 by @qgallouedec in #2592
Fix merge error by @qgallouedec in #2595
🧰 Tool fine-tuning support DPO by @August-murr in #2479
💾 Reduce memory peak in GRPO by adding max_prompt_length and loop usage in logp computation by @qgallouedec in #2598
⚡ Add uv installation instructions by @stevhliu in #2601
🧩 PPO/RLOO/OnlineDPO sequence generation: make deepsped 3 weight gathering optional by @dawidm in #2557
🫷 Include stop token in policy model's generation_config by @dawidm in #2528
✂️ Reintroduce truncation_mode in DPOTrainer by @anakin87 in #2551
👋 Drop MDX by @qgallouedec in #2611
💎 Rename an inner var in GRPO to improve clarity by @qgallouedec in #2616
🏆 Custom reward function for GRPO and shiny doc by @qgallouedec in #2606
🥞 Fix DPO gradient accumulation loss scaling by @winglian in #2615
🥞 Fix BCO gradient accumulation loss scaling by @qgallouedec in #2638
🍭 Custom reward function for RLOO by @August-murr in #2612
🌯 Fix context manager runtime error when gather is disabled by @Superskyyy in #2639
🥞 Fix CPO gradient accumulation loss scaling by @qgallouedec in #2645
🥞 Fix GRPO gradient accumulation loss scaling by @qgallouedec in #2647
🥞 Fix KTO gradient accumulation loss scaling by @qgallouedec in #2648
🚛 Provide all columns of the dataset to the reward function by @qgallouedec in #2650
👐 DeepSpeed integration for GRPO by @qgallouedec in #2652
🔎 Finegrained reward logging for GRPO by @qgallouedec in #2651
📍 Disable caching when grad checkpointing enable in GRPO by @qgallouedec in #2653
📏 Log completion length in GRPO by @qgallouedec in #2659
🌀 Fix GRPO default completion length doc by @andyl98 in #2662
🏷️ Add model tags to model trained with GRPO by @qgallouedec in #2663
🖊 Fix typos by @omahs in #2673
⚡ vLLM for fast generation in GRPO by @qgallouedec in #2600
📉 Use num_logits_to_keep to reduce memory usage in GRPO by @qgallouedec in #2683

New Contributors

@Abhishek-TAMU made their first contribution in #2158
@zhc7 made their first contribution in #2474
@SamuelLarkin made their first contribution in #2549
@umbilnm made their first contribution in #2521
@stevhliu made their first contribution in #2601
@dawidm made their first contribution in #2557
@Superskyyy made their first contribution in #2639
@andyl98 made their first contribution in #2662
@omahs made their first contribution in #2673

Full Changelog: v0.13.0...v0.14.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.14.0

Major and breaking changes

👨‍👨‍👧‍👧 GRPO

What's Changed

New Contributors

Contributors