Releases: huggingface/trl
v0.8.1: Patch release for CLIs
This patch release includes some important fixes for CLIs
What's Changed
- set dev version by @younesbelkada in #1454
- Fix chat CLI for model revisions by @lewtun in #1458
- [chat] add eos token to generate by @lvwerra in #1459
- Release: v0.8.1 by @younesbelkada in #1462
Full Changelog: v0.8.0...v0.8.1
v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP !
New Trainer: KTOTrainer:
We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !
- fix bugs in KTO implementation by @kawine in #1380
- [KTO] merge eval dataset only if it exists by @kashif in #1383
- [KTO] prevent nans from appearing in metrics by @kawine in #1386
- Kto trainer by @kashif in #1181
- [KTO] fix tokenization bugs by @kawine in #1418
- [KTO] model init when args are given by @kashif in #1413
- [KTO] fix various bugs by @kawine in #1402
TRL Command Line Interfaces (CLIs):
Run SFT, DPO and chat with your aligned model directly from the terminal:
SFT:
trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb
DPO:
trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf
Chat:
trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat
Read more about CLI in the relevant documentation section or use --help
for more details.
- FEAT: Add CLIs in TRL ! by @younesbelkada in #1419
- CI / CLI: Properly raise error when CLI tests failed by @younesbelkada in #1446
- chat cli by @lvwerra in #1431
- Fix yaml parsing issue by @younesbelkada in #1450
model
-->model_name_or_path
by @lvwerra in #1452- FEAT: Update README to add DPO + CLIs by @younesbelkada in #1448
FSDP + QLoRA:
SFTTrainer now supports FSDP + QLoRA
- Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in #1416
Other fixes
- set dev version by @younesbelkada in #1332
- Update stack llama 2 example to reflect #aa35fec by @nautsimon in #1333
- FIX: More user friendly error when users don't have PEFT by @younesbelkada in #1350
- fix 8-bit multi-gpu training bug by @fancyerii in #1353
- set seed in sft/dpo/reward_modeling to make result reproducable by @sywangyi in #1357
- Fix transformers version checking for Python < 3.8 by @samuki in #1363
- Add some arguments for support XPU by @yuanwu2017 in #1366
- ENH: Send Docker and transformers main CI results on slack after merging on main by @younesbelkada in #1370
- FEAT: [
SFTTrainer
] Addeval_packing
by @younesbelkada in #1369 - FEAT:
force_use_ref_model
for power users by @younesbelkada in #1367 - FIX: fix after #1370 by @younesbelkada in #1372
- FIX: Change ci to fail-fast=False by @younesbelkada in #1373
- FIX: Fix the CI again .. by @younesbelkada in #1374
- Log ddpo reward as float to fix numpy conversion during bf16 training by @skavulya in #1391
- Fix the pad_token_id error by @yuanwu2017 in #1394
- FIX [
RewardModeling
] Fix RM script for PEFT by @younesbelkada in #1393 - Fix import error from deprecation in transformers by @lewtun in #1415
- CI: Fix CI on main by @younesbelkada in #1422
- [Kto] torch_dtype kwargs fix by @kashif in #1429
- Create standard dataset for TRL by @vwxyzjn in #1424
- FIX: fix doc build on main by @younesbelkada in #1437
- Fix PPOTrainer README example by @nikihowe in #1441
- Before update the tr_loss, make sure tr_loss_step is in the same device. by @pengwei715 in #1439
- Release: v0.8.0 by @younesbelkada in #1453
New Contributors
- @nautsimon made their first contribution in #1333
- @fancyerii made their first contribution in #1353
- @samuki made their first contribution in #1363
- @yuanwu2017 made their first contribution in #1366
- @kawine made their first contribution in #1380
- @skavulya made their first contribution in #1391
- @pengwei715 made their first contribution in #1439
Full Changelog: v0.7.11...v0.8.0
v0.7.11: IPO & DPO fixes, faster data processing for multi-GPU, Automatic tagging for all models
DPO important fixes
We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:
We also fixed important bugs with respect to DPO / PEFT and Flash Attention
- [
DPOTrainer
] Fix DPO trainer + mistral + FA2 by @younesbelkada in #1290
Data processing is now faster for multi-GPU envs
- [
DPOTrainer
] Load data only on main process + fix dpo example test by @younesbelkada in #1291 - Add multiprocessing in the DPO trainer. by @imraviagrawal in #1286
Other DPO bugfixes:
- [
PEFT
+DPO
] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in #1289 - Fix wrong variable name in DPOTrainer documentation example by @ouhenio in #1280
- fix padding in dpo trainer by @pacman100 in #1284
- Fix AttributeError in dpo_trainer for reference_free case in dpo_loss function by @maliozer in #1313
- [DPOTrainer] Add multiprocessing for the eval_dataset map by @esceptico in #1307
Faster data processing and other enhancements:
- Only load data on main process by @JohnGiorgi in #1255
- Remove tyro by @vwxyzjn in #1176
Automatic tagging for all models
Models now gets tagged correctly even if users do not call trainer.push_to_hub()
- [
core
/xxxTrainer
] Automatic tagging by @younesbelkada in #1329
What's Changed
- set dev version by @younesbelkada in #1254
- Update Model Generation config to reflect new special tokens by @philschmid in #1256
- Fix a typo in variable name by @otlaitil in #1269
- FIx SFTTrainer bugs on TRL main by @younesbelkada in #1276
- Fix SFT tuner in CI by @vwxyzjn in #1278
- Fix sft ci by @vwxyzjn in #1279
- Fix DPO slow tests by @younesbelkada in #1292
- Fix sft trainer when args is None by @younesbelkada in #1295
- Fix
DPOTrainer
docstrings by @alvarobartt in #1298 - Types: Fix PEP 484 implicit-optional compliance by @akx in #1297
- Update sft_trainer.mdx to add note on launching DDP training by @johnowhitaker in #1308
- Codemod Unittest assertions to bare asserts by @akx in #1301
- ENH: Run CI only if relevant files are modified by @younesbelkada in #1309
- Fix typos in docs for Multi Adapter RL (MARL). by @elhusseiniali in #1312
- Fix doc snippet PPOTrainer argument train_dataset -> dataset by @j-cb in #1321
- Best practice recommendation update for dpo_trainer.mdx by @R-seny in #1325
- pre-commit: replace linters + formatters with Ruff; fix some issues by @akx in #1300
- Update README.md to clarify model requirement by @markstur in #1315
- [
core
/DDPO
] Fix diffusers import issue by @younesbelkada in #1314 - [
CI
] Add tests on transformers peft main on push main by @younesbelkada in #1328 - Release: v0.7.11 by @younesbelkada in #1331
New Contributors
- @otlaitil made their first contribution in #1269
- @JohnGiorgi made their first contribution in #1255
- @ouhenio made their first contribution in #1280
- @imraviagrawal made their first contribution in #1286
- @akx made their first contribution in #1297
- @esceptico made their first contribution in #1307
- @johnowhitaker made their first contribution in #1308
- @elhusseiniali made their first contribution in #1312
- @maliozer made their first contribution in #1313
- @j-cb made their first contribution in #1321
- @R-seny made their first contribution in #1325
- @markstur made their first contribution in #1315
Full Changelog: v0.7.10...v0.7.11
v0.7.10: Automatic templating, `setup_chat_format` API, stronger tests
v0.7.10: Minor fixes, Automatic templating, setup_chat_format
API, stronger tests
This Patch release adds a new feature in TRL for dealing with chat datasets - you can load a directly formatted dataset without the need of formatting it beforehand.
Read more about it here: https://huggingface.co/docs/trl/sft_trainer#dataset-format-support
The release also introduces a new API setup_chat_format
to correctly resize the model embeddings with the target size when adding new tokens to comply with the chat format. Currently we only support chatml
format and we can add more formats in the future
Read more about it here: https://huggingface.co/docs/trl/sft_trainer#add-special-tokens-for-chat-format
We also extensively test SFTTrainer and DPOTrainer and the example scripts, dpo.py
and sft.py
should be well -battletested. If you see any issue with the script, please let us know on GitHub.
What's Changed
- set dev version by @younesbelkada in #1207
- Check tokenize params on DPOTrainer by @pablovicente in #1197
- Fix shape descriptions in calculate_loss method by @yuta0x89 in #1204
- Fix FSDP error by @mgerstgrasser in #1196
- Update Unsloth SFT, DPO docs by @danielhanchen in #1213
- Fix args type by @zspo in #1214
- [
core
/Docker
] Add workflow to build TRL docker images by @younesbelkada in #1215 - Refactor RewardConfig to own module by @lewtun in #1221
- Add support for ChatML dataset format in by @philschmid in #1208
- Add slow test workflow file by @younesbelkada in #1223
- Remove a repeating line in how_to_train.md by @kykim0 in #1226
- Logs metrics on all distributed processes when using DPO & FSDP by @AjayP13 in #1160
- fix: improve error message when
pad_token_id
is not configured by @yumemio in #1152 - [
core
/ tests ] v1 slow tests by @younesbelkada in #1218 - [
core
/ SFTTrainer] Fix breaking change by @younesbelkada in #1229 - Fixes slow tests by @younesbelkada in #1241
- Fix weird doc bug by @younesbelkada in #1244
- Add
setup_chat_format
for adding new special tokens to model for training chat models by @philschmid in #1242 - Fix chatml template by @philschmid in #1248
- fix: fix loss_type and some args desc by @zspo in #1247
- Release: v0.7.10 by @younesbelkada in #1253
New Contributors
- @yuta0x89 made their first contribution in #1204
- @danielhanchen made their first contribution in #1213
- @zspo made their first contribution in #1214
- @philschmid made their first contribution in #1208
- @kykim0 made their first contribution in #1226
- @AjayP13 made their first contribution in #1160
- @yumemio made their first contribution in #1152
Full Changelog: v0.7.9...v0.7.10
v0.7.9: Patch release for DPO & SFTTrainer
v0.7.9: Patch release for DPO & SFTTrainer
This is a patch release that fixes critical issues with SFTTrainer & DPOTrainer, together with minor fixes for PPOTrainer and DataCollatorForCompletionOnlyLM
What's Changed
- Release: v0.7.8 by @younesbelkada in #1200
- set dev version by @younesbelkada in #1201
- Fix instruction token masking by @mgerstgrasser in #1185
- Fix reported KL in PPO trainer by @mgerstgrasser in #1180
- [
DPOTrainer
] Fix peft + DPO + bf16 if one usesgenerate_during_eval
or pre-computed logits by @younesbelkada in #1203 - Revert "Address issue #1122" by @younesbelkada in #1205
- Release: v0.7.9 by @younesbelkada in #1206
Full Changelog: v0.7.8...v0.7.9
v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO
v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO
Unsloth tag for xxxTrainer
If users use Unsloth library, the unsloth
tag gets automatically pushed on the Hub.
- [
xxxTrainer
] Add unsloth tag by @younesbelkada in #1130
DPO fixes
Some important fixes for DPO has been introduced to address: https://twitter.com/jon_durbin/status/1743575483365699809 and to make DPO faster
- Allow separate devices for target/ref models. by @jondurbin in #1190
- Allow swapping PEFT adapters for target/ref model. by @jondurbin in #1193
- Change device access order for speedup of calculating metrics in DPOTrainer by @brcps12 in #1154
DDPO + PEFT
Now DDPO supports PEFT
- add: support for
peft
in ddpo. by @sayakpaul in #1165
Other fixes
- add peft_module_casting_to_bf16 in DPOTrainer by @sywangyi in #1143
- SFT Tokenizer Fix by @ChrisCates in #1142
- Minor fixes to some comments in some examples. by @mattholl in #1156
- Correct shapes in docstring of PPOTrainer's train_minibatch method by @nikihowe in #1170
- Update sft_trainer.py by @Hemanthkumar2112 in #1162
- Fix batch all gather by @vwxyzjn in #1177
- Address issue #1122 by @maneandrea in #1174
- Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. by @Jfhseh in #1171
- SFTTrainer: follow args.remove_unused_columns by @mgerstgrasser in #1188
- Handle last token from generation prompt by @pablovicente in #1153
New Contributors
- @ChrisCates made their first contribution in #1142
- @brcps12 made their first contribution in #1154
- @mattholl made their first contribution in #1156
- @sayakpaul made their first contribution in #1165
- @nikihowe made their first contribution in #1170
- @Hemanthkumar2112 made their first contribution in #1162
- @maneandrea made their first contribution in #1174
- @Jfhseh made their first contribution in #1171
- @mgerstgrasser made their first contribution in #1188
- @pablovicente made their first contribution in #1153
- @jondurbin made their first contribution in #1190
Full Changelog: v0.7.7...v0.7.8
v0.7.7
v0.7.7: Patch release PPO & DDPO tags
A fix has been introduce to fix a breaking change with PPOTrainer.push_to_hub()
and DDPOTrainer.push_to_hub()
- [
PPOTrainer
/DDPOTrainer
] Fix ppo & ddpo push to Hub by @younesbelkada in #1141
What's Changed
- Release: v0.7.6 by @younesbelkada in #1134
- set dev version by @younesbelkada in #1135
- clear up the parameters of supervised_finetuning.py by @sywangyi in #1126
- Add type hints to core.py by @zachschillaci27 in #1097
- fix_ddpo_demo by @zhangsibo1129 in #1129
- Add npu support for ppo example by @zhangsibo1129 in #1128
New Contributors
- @zachschillaci27 made their first contribution in #1097
- @zhangsibo1129 made their first contribution in #1129
Full Changelog: v0.7.6...v0.7.7
v0.7.6: Patch release - Multi-tag instead of single tags for `xxxTrainer`
Patch release: Multi-tag instead of single tags for xxxTrainer
This is a patch release to push multiple tags (e.g. trl
& sft
) instead of one tag
What's Changed
- Release: v0.7.5 by @younesbelkada in #1131
- set dev version by @younesbelkada in #1132
- [
xxxTrainer
] multi-tags support for tagging by @younesbelkada in #1133
Full Changelog: v0.7.5...v0.7.6
v0.7.5: IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`
IPO & KTO & cDPO loss, DPOTrainer
enhancements, automatic tags for xxxTrainer
Important enhancements for DPOTrainer
This release introduces many new features in TRL for DPOTrainer
:
- IPO-loss for a better generalization of DPO algorithm
- KTO & cDPO loss
- You can also pass pre-computed logits to
DPOTrainer
- [DPO] Refactor eval logging of dpo trainer by @mnoukhov in #954
- Fixes reward and text gathering in distributed training by @edbeeching in #850
- remove spurious optimize_cuda_cache deprecation warning on init by @ChanderG in #1045
- Revert "[DPO] Refactor eval logging of dpo trainer (#954)" by @lvwerra in #1047
- Fix DPOTrainer + PEFT 2 by @rdk31 in #1049
- [DPO] IPO Training loss by @kashif in #1022
- [DPO] cDPO loss by @kashif in #1035
- [DPO] use ref model logprobs if it exists in the data by @kashif in #885
- [DP0] save eval_dataset for subsequent calls by @kashif in #1125
- [DPO] rename kto loss by @kashif in #1127
- [DPO] add KTO loss by @kashif in #1075
Automatic xxxTrainer
tagging on the Hub
Now, trainers from TRL pushes automatically tags trl-sft
, trl-dpo
, trl-ddpo
when pushing models on the Hub
- [
xxxTrainer
] Add tags to all trainers in TRL by @younesbelkada in #1120
unsloth 🤝 TRL
We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer
- [
Docs
] Add unsloth optimizations in TRL's documentation by @younesbelkada in #1119
What's Changed
- set dev version by @younesbelkada in #970
- [
Tests
] Add non optional packages tests by @younesbelkada in #974 - [DOCS] Fix outdated references to
examples/
by @alvarobartt in #977 - Update README.md by @GeekDream-x in #994
- [DataCollatorForCompletionOnlyLM] Warn on identical
eos_token_id
andpad_token_id
by @MustSave in #988 - [
DataCollatorForCompletionOnlyLM
] Add more clarification / guidance in the casetokenizer.pad_token_id == tokenizer.eos_token_id
by @younesbelkada in #992 - make distributed true for multiple process by @allanj in #997
- Fixed wrong trigger for warning by @zabealbe in #971
- Update how_to_train.md by @halfrot in #1003
- Adds
requires_grad
to input for non-quantized peft models by @younesbelkada in #1006 - [Multi-Adapter PPO] Fix and Refactor reward model adapter by @mnoukhov in #982
- Remove duplicate data loading in rl_training.py by @viethoangtranduong in #1020
- [Document] Minor fixes of sft_trainer document by @mutichung in #1029
- Update utils.py by @ZihanWang314 in #1012
- spelling is hard by @grahamannett in #1043
- Fixing accelerator version function call. by @ParthaEth in #1056
- [SFT Trainer] precompute packed iterable into a dataset by @lvwerra in #979
- Update doc CI by @lewtun in #1060
- Improve PreTrainedModelWrapper._get_current_device by @billvsme in #1048
- Update doc for the computer_metrics argument of SFTTrainer by @albertauyeung in #1062
- [
core
] Fix failing tests on main by @younesbelkada in #1065 - [
SFTTrainer
] Fix Trainer when args is None by @younesbelkada in #1064 - enable multiple eval datasets by @peter-sk in #1052
- Add missing
loss_type
inValueError
message by @alvarobartt in #1067 - Add args to SFT example by @lewtun in #1079
- add local folder support as input for rl_training. by @sywangyi in #1078
- Make CI happy by @younesbelkada in #1080
- Removing
tyro
insft_llama2.py
by @vwxyzjn in #1081 - Log arg consistency by @tcapelle in #1084
- Updated documentation for docs/source/reward_trainer.mdx to import th… by @cm2435 in #1092
- [Feature] Add Ascend NPU accelerator support by @statelesshz in #1096
peft_module_casting_to_bf16
util method,append_concat_token
flag, remove callbackPeftSavingCallback
by @pacman100 in #1110- Make prepending of bos token configurable. by @pacman100 in #1114
- fix gradient checkpointing when using PEFT by @pacman100 in #1118
- Update
description
insetup.py
by @alvarobartt in #1101
New Contributors
- @alvarobartt made their first contribution in #977
- @GeekDream-x made their first contribution in #994
- @MustSave made their first contribution in #988
- @allanj made their first contribution in #997
- @zabealbe made their first contribution in #971
- @viethoangtranduong made their first contribution in #1020
- @mutichung made their first contribution in #1029
- @ZihanWang314 made their first contribution in #1012
- @grahamannett made their first contribution in #1043
- @ChanderG made their first contribution in #1045
- @rdk31 made their first contribution in #1049
- @ParthaEth made their first contribution in #1056
- @billvsme made their first contribution in #1048
- @albertauyeung made their first contribution in #1062
- @peter-sk made their first contribution in #1052
- @sywangyi made their first contribution in #1078
- @tcapelle made their first contribution in #1084
- @cm2435 made their first contribution in #1092
- @statelesshz made their first contribution in #1096
- @pacman100 made their first contribution in #1110
Full Changelog: v0.7.4...v0.7.5
v0.7.4: Patch Release
Patch Release
This release is a patch release that addresses an issue for users that have TRL installed without PEFT
What's Changed
- Release: v0.7.3 by @younesbelkada in #965
- set dev version by @younesbelkada in #966
- [
core
] Fix peft config typehint by @younesbelkada in #967 - Pin bnb to <=0.41.1 by @younesbelkada in #968
Full Changelog: v0.7.3...v0.7.4