Major and breaking changes
π¨βπ¨βπ§βπ§ GRPO
by @qgallouedec in #2565
What's Changed
- β°οΈ Remove deprecated by @qgallouedec in #2485
- π£οΈ Improve prose for smol course by @burtenshaw in #2487
- π€© Add SmolVLM tutorials to Community Tutorials page by @sergiopaniego in #2498
- ποΈ Proper dataset for documentation images by @qgallouedec in #2499
- ποΈ Reorganize documentation by @qgallouedec in #2483
- [ORPO] fix orpo chosen-nll loss by @kashif in #2502
- π Remove unused components by @qgallouedec in #2480
- Update community_tutorials.md by @qgallouedec in #2509
- β Remove RLOO example test by @qgallouedec in #2513
- π¨βπ³ Clarify DPO data preparation by @qgallouedec in #2512
- π§ Generalize
disable_dropout
by @qgallouedec in #2511 - π¬ Rename collator
PreferenceCollator
toDataCollatorForPreference
by @qgallouedec in #2510 - π¦ Packing documentation by @qgallouedec in #2503
- βοΈ Update Comet integration to include LogCompletionsCallback and Trainer.evaluation_loop() by @yaricom in #2501
- Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM by @Abhishek-TAMU in #2158
- π Use field in dataclasses by @qgallouedec in #2494
- Β©οΈ Update copyrights year by @qgallouedec in #2547
- π§βπ€βπ§ Proper metrics gathering across ranks before logging by @zhc7 in #2474
- βοΈ Fix typo in
formatting_func
's documentation inConstantLengthDataset
by @SamuelLarkin in #2549 - ποΈ DPO padding free by @qgallouedec in #2520
- βΉοΈ XPU support for DPO by @faaany in #2533
- π Fix SFT truncation documentation by @umbilnm in #2521
- β©οΈ Revert ORPO loss changes by @kashif in #2527
- π΄ Add readme for datasets by @August-murr in #2491
- π Fix dataset type unpair conversion docs by @claralp in #2550
- [RLOO] Reinforce++ by @kashif in #2552
- ποΈ Improve DPO configuration documentation structure by @qgallouedec in #2561
- β¨ Refine model card method docstring by @qgallouedec in #2566
- πͺ Minor comment style modif by @qgallouedec in #2582
- ποΈ vllm for Online DPO by @qgallouedec in #2558
- π Issues Auto-Labeller by @August-murr in #2542
- π Simplify bug report template by @qgallouedec in #2585
- [RLOO] fix token_level_kl by @kashif in #2575
- βοΈ Truncate by default by @qgallouedec in #2587
- π«’ Add
max_prompt_length
parameter in tests by @qgallouedec in #2588 - ποΈ Fix documentation SFT -
max_seq_length
instead ofmax_length
by @skandermoalla in #2590 - π¨βπ¨βπ§βπ§ GRPO by @qgallouedec in #2565
- π«£ Ignore CLI test for Python 3.9 by @qgallouedec in #2592
- Fix merge error by @qgallouedec in #2595
- π§° Tool fine-tuning support DPO by @August-murr in #2479
- πΎ Reduce memory peak in GRPO by adding
max_prompt_length
and loop usage in logp computation by @qgallouedec in #2598 - β‘ Add uv installation instructions by @stevhliu in #2601
- 𧩠PPO/RLOO/OnlineDPO sequence generation: make deepsped 3 weight gathering optional by @dawidm in #2557
- π«· Include stop token in policy model's generation_config by @dawidm in #2528
- βοΈ Reintroduce
truncation_mode
inDPOTrainer
by @anakin87 in #2551 - π Drop MDX by @qgallouedec in #2611
- π Rename an inner var in GRPO to improve clarity by @qgallouedec in #2616
- π Custom reward function for GRPO and shiny doc by @qgallouedec in #2606
- π₯ Fix DPO gradient accumulation loss scaling by @winglian in #2615
- π₯ Fix BCO gradient accumulation loss scaling by @qgallouedec in #2638
- π Custom reward function for RLOO by @August-murr in #2612
- π― Fix context manager runtime error when gather is disabled by @Superskyyy in #2639
- π₯ Fix CPO gradient accumulation loss scaling by @qgallouedec in #2645
- π₯ Fix GRPO gradient accumulation loss scaling by @qgallouedec in #2647
- π₯ Fix KTO gradient accumulation loss scaling by @qgallouedec in #2648
- π Provide all columns of the dataset to the reward function by @qgallouedec in #2650
- π DeepSpeed integration for GRPO by @qgallouedec in #2652
- π Finegrained reward logging for GRPO by @qgallouedec in #2651
- π Disable caching when grad checkpointing enable in GRPO by @qgallouedec in #2653
- π Log completion length in GRPO by @qgallouedec in #2659
- π Fix GRPO default completion length doc by @andyl98 in #2662
- π·οΈ Add model tags to model trained with GRPO by @qgallouedec in #2663
- π Fix typos by @omahs in #2673
- β‘ vLLM for fast generation in GRPO by @qgallouedec in #2600
- π Use
num_logits_to_keep
to reduce memory usage in GRPO by @qgallouedec in #2683
New Contributors
- @Abhishek-TAMU made their first contribution in #2158
- @zhc7 made their first contribution in #2474
- @SamuelLarkin made their first contribution in #2549
- @umbilnm made their first contribution in #2521
- @stevhliu made their first contribution in #2601
- @dawidm made their first contribution in #2557
- @Superskyyy made their first contribution in #2639
- @andyl98 made their first contribution in #2662
- @omahs made their first contribution in #2673
Full Changelog: v0.13.0...v0.14.0