20 Mar 10:39

younesbelkada

8534f0e

v0.8.1: Patch release for CLIs

This patch release includes some important fixes for CLIs

What's Changed

set dev version by @younesbelkada in #1454
Fix chat CLI for model revisions by @lewtun in #1458
[chat] add eos token to generate by @lvwerra in #1459
Release: v0.8.1 by @younesbelkada in #1462

Full Changelog: v0.8.0...v0.8.1

Contributors

lvwerra, lewtun, and younesbelkada

Assets 2

19 Mar 16:25

younesbelkada

v0.8.0

f2c7177

v0.8.0: KTOTrainer, TRL CLIs, QLoRA + FSDP !

New Trainer: KTOTrainer:

We recently introduced the KTOTrainer in order to run KTO algorithms on LLMs !

fix bugs in KTO implementation by @kawine in #1380
[KTO] merge eval dataset only if it exists by @kashif in #1383
[KTO] prevent nans from appearing in metrics by @kawine in #1386
Kto trainer by @kashif in #1181
[KTO] fix tokenization bugs by @kawine in #1418
[KTO] model init when args are given by @kashif in #1413
[KTO] fix various bugs by @kawine in #1402

TRL Command Line Interfaces (CLIs):

Run SFT, DPO and chat with your aligned model directly from the terminal:

SFT:

trl sft --model_name_or_path facebook/opt-125m --dataset_name imdb --output_dir opt-sft-imdb

DPO:

trl dpo --model_name_or_path facebook/opt-125m --dataset_name trl-internal-testing/Anthropic-hh-rlhf-processed --output_dir opt-sft-hh-rlhf

Chat:

trl chat --model_name_or_path Qwen/Qwen1.5-0.5B-Chat

Read more about CLI in the relevant documentation section or use --help for more details.

FEAT: Add CLIs in TRL ! by @younesbelkada in #1419
CI / CLI: Properly raise error when CLI tests failed by @younesbelkada in #1446
chat cli by @lvwerra in #1431
Fix yaml parsing issue by @younesbelkada in #1450
model --> model_name_or_path by @lvwerra in #1452
FEAT: Update README to add DPO + CLIs by @younesbelkada in #1448

FSDP + QLoRA:

SFTTrainer now supports FSDP + QLoRA

Add support for FSDP+QLoRA and DeepSpeed ZeRO3+QLoRA by @pacman100 in #1416

Other fixes

set dev version by @younesbelkada in #1332
Update stack llama 2 example to reflect #aa35fec by @nautsimon in #1333
FIX: More user friendly error when users don't have PEFT by @younesbelkada in #1350
fix 8-bit multi-gpu training bug by @fancyerii in #1353
set seed in sft/dpo/reward_modeling to make result reproducable by @sywangyi in #1357
Fix transformers version checking for Python < 3.8 by @samuki in #1363
Add some arguments for support XPU by @yuanwu2017 in #1366
ENH: Send Docker and transformers main CI results on slack after merging on main by @younesbelkada in #1370
FEAT: [SFTTrainer] Add eval_packing by @younesbelkada in #1369
FEAT: force_use_ref_model for power users by @younesbelkada in #1367
FIX: fix after #1370 by @younesbelkada in #1372
FIX: Change ci to fail-fast=False by @younesbelkada in #1373
FIX: Fix the CI again .. by @younesbelkada in #1374
Log ddpo reward as float to fix numpy conversion during bf16 training by @skavulya in #1391
Fix the pad_token_id error by @yuanwu2017 in #1394
FIX [RewardModeling] Fix RM script for PEFT by @younesbelkada in #1393
Fix import error from deprecation in transformers by @lewtun in #1415
CI: Fix CI on main by @younesbelkada in #1422
[Kto] torch_dtype kwargs fix by @kashif in #1429
Create standard dataset for TRL by @vwxyzjn in #1424
FIX: fix doc build on main by @younesbelkada in #1437
Fix PPOTrainer README example by @nikihowe in #1441
Before update the tr_loss, make sure tr_loss_step is in the same device. by @pengwei715 in #1439
Release: v0.8.0 by @younesbelkada in #1453

New Contributors

@nautsimon made their first contribution in #1333
@fancyerii made their first contribution in #1353
@samuki made their first contribution in #1363
@yuanwu2017 made their first contribution in #1366
@kawine made their first contribution in #1380
@skavulya made their first contribution in #1391
@pengwei715 made their first contribution in #1439

Full Changelog: v0.7.11...v0.8.0

Contributors

kashif, fancyerii, and 13 other contributors

Assets 2

16 Feb 08:22

younesbelkada

v0.7.11

0f13e51

v0.7.11: IPO & DPO fixes, faster data processing for multi-GPU, Automatic tagging for all models

DPO important fixes

We fixed issues with respect to IPO loss, leading to consistent results according to newest experiements:

[DPO] average_log_prob when loss is IPO by @kashif in #1265

We also fixed important bugs with respect to DPO / PEFT and Flash Attention

[DPOTrainer] Fix DPO trainer + mistral + FA2 by @younesbelkada in #1290

Data processing is now faster for multi-GPU envs

[DPOTrainer] Load data only on main process + fix dpo example test by @younesbelkada in #1291
Add multiprocessing in the DPO trainer. by @imraviagrawal in #1286

Other DPO bugfixes:

[PEFT + DPO] Raise value error if one passes a ref_model and a peft_config by @younesbelkada in #1289
Fix wrong variable name in DPOTrainer documentation example by @ouhenio in #1280
fix padding in dpo trainer by @pacman100 in #1284
Fix AttributeError in dpo_trainer for reference_free case in dpo_loss function by @maliozer in #1313
[DPOTrainer] Add multiprocessing for the eval_dataset map by @esceptico in #1307

Faster data processing and other enhancements:

Only load data on main process by @JohnGiorgi in #1255
Remove tyro by @vwxyzjn in #1176

Automatic tagging for all models

Models now gets tagged correctly even if users do not call trainer.push_to_hub()

[core / xxxTrainer] Automatic tagging by @younesbelkada in #1329

What's Changed

set dev version by @younesbelkada in #1254
Update Model Generation config to reflect new special tokens by @philschmid in #1256
Fix a typo in variable name by @otlaitil in #1269
FIx SFTTrainer bugs on TRL main by @younesbelkada in #1276
Fix SFT tuner in CI by @vwxyzjn in #1278
Fix sft ci by @vwxyzjn in #1279
Fix DPO slow tests by @younesbelkada in #1292
Fix sft trainer when args is None by @younesbelkada in #1295
Fix DPOTrainer docstrings by @alvarobartt in #1298
Types: Fix PEP 484 implicit-optional compliance by @akx in #1297
Update sft_trainer.mdx to add note on launching DDP training by @johnowhitaker in #1308
Codemod Unittest assertions to bare asserts by @akx in #1301
ENH: Run CI only if relevant files are modified by @younesbelkada in #1309
Fix typos in docs for Multi Adapter RL (MARL). by @elhusseiniali in #1312
Fix doc snippet PPOTrainer argument train_dataset -> dataset by @j-cb in #1321
Best practice recommendation update for dpo_trainer.mdx by @R-seny in #1325
pre-commit: replace linters + formatters with Ruff; fix some issues by @akx in #1300
Update README.md to clarify model requirement by @markstur in #1315
[core / DDPO] Fix diffusers import issue by @younesbelkada in #1314
[CI] Add tests on transformers peft main on push main by @younesbelkada in #1328
Release: v0.7.11 by @younesbelkada in #1331

New Contributors

@otlaitil made their first contribution in #1269
@JohnGiorgi made their first contribution in #1255
@ouhenio made their first contribution in #1280
@imraviagrawal made their first contribution in #1286
@akx made their first contribution in #1297
@esceptico made their first contribution in #1307
@johnowhitaker made their first contribution in #1308
@elhusseiniali made their first contribution in #1312
@maliozer made their first contribution in #1313
@j-cb made their first contribution in #1321
@R-seny made their first contribution in #1325
@markstur made their first contribution in #1315

Full Changelog: v0.7.10...v0.7.11

Contributors

kashif, akx, and 16 other contributors

Assets 2

19 Jan 10:58

younesbelkada

v0.7.10

09ca760

v0.7.10: Automatic templating, `setup_chat_format` API, stronger tests

v0.7.10: Minor fixes, Automatic templating, `setup_chat_format` API, stronger tests

This Patch release adds a new feature in TRL for dealing with chat datasets - you can load a directly formatted dataset without the need of formatting it beforehand.

The release also introduces a new API setup_chat_format to correctly resize the model embeddings with the target size when adding new tokens to comply with the chat format. Currently we only support chatml format and we can add more formats in the future

We also extensively test SFTTrainer and DPOTrainer and the example scripts, dpo.py and sft.py should be well -battletested. If you see any issue with the script, please let us know on GitHub.

What's Changed

set dev version by @younesbelkada in #1207
Check tokenize params on DPOTrainer by @pablovicente in #1197
Fix shape descriptions in calculate_loss method by @yuta0x89 in #1204
Fix FSDP error by @mgerstgrasser in #1196
Update Unsloth SFT, DPO docs by @danielhanchen in #1213
Fix args type by @zspo in #1214
[core / Docker] Add workflow to build TRL docker images by @younesbelkada in #1215
Refactor RewardConfig to own module by @lewtun in #1221
Add support for ChatML dataset format in by @philschmid in #1208
Add slow test workflow file by @younesbelkada in #1223
Remove a repeating line in how_to_train.md by @kykim0 in #1226
Logs metrics on all distributed processes when using DPO & FSDP by @AjayP13 in #1160
fix: improve error message when pad_token_id is not configured by @yumemio in #1152
[core / tests ] v1 slow tests by @younesbelkada in #1218
[core / SFTTrainer] Fix breaking change by @younesbelkada in #1229
Fixes slow tests by @younesbelkada in #1241
Fix weird doc bug by @younesbelkada in #1244
Add setup_chat_format for adding new special tokens to model for training chat models by @philschmid in #1242
Fix chatml template by @philschmid in #1248
fix: fix loss_type and some args desc by @zspo in #1247
Release: v0.7.10 by @younesbelkada in #1253

New Contributors

@yuta0x89 made their first contribution in #1204
@danielhanchen made their first contribution in #1213
@zspo made their first contribution in #1214
@philschmid made their first contribution in #1208
@kykim0 made their first contribution in #1226
@AjayP13 made their first contribution in #1160
@yumemio made their first contribution in #1152

Full Changelog: v0.7.9...v0.7.10

Contributors

kykim0, AjayP13, and 9 other contributors

Assets 2

09 Jan 12:06

younesbelkada

v0.7.9

7a95cc8

v0.7.9: Patch release for DPO & SFTTrainer

This is a patch release that fixes critical issues with SFTTrainer & DPOTrainer, together with minor fixes for PPOTrainer and DataCollatorForCompletionOnlyLM

What's Changed

Release: v0.7.8 by @younesbelkada in #1200
set dev version by @younesbelkada in #1201
Fix instruction token masking by @mgerstgrasser in #1185
Fix reported KL in PPO trainer by @mgerstgrasser in #1180
[DPOTrainer] Fix peft + DPO + bf16 if one uses generate_during_eval or pre-computed logits by @younesbelkada in #1203
Revert "Address issue #1122" by @younesbelkada in #1205
Release: v0.7.9 by @younesbelkada in #1206

Full Changelog: v0.7.8...v0.7.9

Contributors

mgerstgrasser and younesbelkada

Assets 2

09 Jan 04:17

younesbelkada

v0.7.8

a772412

v0.7.8: Unsloth tag, DPO fixes, PEFT support for DDPO

Unsloth tag for `xxxTrainer`

If users use Unsloth library, the unsloth tag gets automatically pushed on the Hub.

[xxxTrainer] Add unsloth tag by @younesbelkada in #1130

DPO fixes

Some important fixes for DPO has been introduced to address: https://twitter.com/jon_durbin/status/1743575483365699809 and to make DPO faster

Allow separate devices for target/ref models. by @jondurbin in #1190
Allow swapping PEFT adapters for target/ref model. by @jondurbin in #1193
Change device access order for speedup of calculating metrics in DPOTrainer by @brcps12 in #1154

DDPO + PEFT

Now DDPO supports PEFT

add: support for peft in ddpo. by @sayakpaul in #1165

Other fixes

add peft_module_casting_to_bf16 in DPOTrainer by @sywangyi in #1143
SFT Tokenizer Fix by @ChrisCates in #1142
Minor fixes to some comments in some examples. by @mattholl in #1156
Correct shapes in docstring of PPOTrainer's train_minibatch method by @nikihowe in #1170
Update sft_trainer.py by @Hemanthkumar2112 in #1162
Fix batch all gather by @vwxyzjn in #1177
Address issue #1122 by @maneandrea in #1174
Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. by @Jfhseh in #1171
SFTTrainer: follow args.remove_unused_columns by @mgerstgrasser in #1188
Handle last token from generation prompt by @pablovicente in #1153

New Contributors

@ChrisCates made their first contribution in #1142
@brcps12 made their first contribution in #1154
@mattholl made their first contribution in #1156
@sayakpaul made their first contribution in #1165
@nikihowe made their first contribution in #1170
@Hemanthkumar2112 made their first contribution in #1162
@maneandrea made their first contribution in #1174
@Jfhseh made their first contribution in #1171
@mgerstgrasser made their first contribution in #1188
@pablovicente made their first contribution in #1153
@jondurbin made their first contribution in #1190

Full Changelog: v0.7.7...v0.7.8

Contributors

mattholl, jondurbin, and 12 other contributors

Assets 2

26 Dec 09:27

younesbelkada

v0.7.7

81cedd2

v0.7.7

v0.7.7: Patch release PPO & DDPO tags

A fix has been introduce to fix a breaking change with PPOTrainer.push_to_hub() and DDPOTrainer.push_to_hub()

[PPOTrainer / DDPOTrainer] Fix ppo & ddpo push to Hub by @younesbelkada in #1141

What's Changed

Release: v0.7.6 by @younesbelkada in #1134
set dev version by @younesbelkada in #1135
clear up the parameters of supervised_finetuning.py by @sywangyi in #1126
Add type hints to core.py by @zachschillaci27 in #1097
fix_ddpo_demo by @zhangsibo1129 in #1129
Add npu support for ppo example by @zhangsibo1129 in #1128

New Contributors

@zachschillaci27 made their first contribution in #1097
@zhangsibo1129 made their first contribution in #1129

Full Changelog: v0.7.6...v0.7.7

Contributors

sywangyi, zachschillaci27, and 2 other contributors

Assets 2

22 Dec 14:10

younesbelkada

v0.7.6

5fc66d6

v0.7.6: Patch release - Multi-tag instead of single tags for `xxxTrainer`

Patch release: Multi-tag instead of single tags for `xxxTrainer`

This is a patch release to push multiple tags (e.g. trl & sft) instead of one tag

What's Changed

Release: v0.7.5 by @younesbelkada in #1131
set dev version by @younesbelkada in #1132
[xxxTrainer] multi-tags support for tagging by @younesbelkada in #1133

Full Changelog: v0.7.5...v0.7.6

Contributors

younesbelkada

Assets 2

22 Dec 13:09

younesbelkada

v0.7.5

0c77f98

v0.7.5: IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

Important enhancements for `DPOTrainer`

This release introduces many new features in TRL for DPOTrainer:

IPO-loss for a better generalization of DPO algorithm
KTO & cDPO loss
You can also pass pre-computed logits to DPOTrainer

[DPO] Refactor eval logging of dpo trainer by @mnoukhov in #954
Fixes reward and text gathering in distributed training by @edbeeching in #850
remove spurious optimize_cuda_cache deprecation warning on init by @ChanderG in #1045
Revert "[DPO] Refactor eval logging of dpo trainer (#954)" by @lvwerra in #1047
Fix DPOTrainer + PEFT 2 by @rdk31 in #1049
[DPO] IPO Training loss by @kashif in #1022
[DPO] cDPO loss by @kashif in #1035
[DPO] use ref model logprobs if it exists in the data by @kashif in #885
[DP0] save eval_dataset for subsequent calls by @kashif in #1125
[DPO] rename kto loss by @kashif in #1127
[DPO] add KTO loss by @kashif in #1075

Automatic `xxxTrainer` tagging on the Hub

Now, trainers from TRL pushes automatically tags trl-sft, trl-dpo, trl-ddpo when pushing models on the Hub

[xxxTrainer] Add tags to all trainers in TRL by @younesbelkada in #1120

unsloth 🤝 TRL

We encourage users to try out unsloth library for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer

[Docs] Add unsloth optimizations in TRL's documentation by @younesbelkada in #1119

What's Changed

set dev version by @younesbelkada in #970
[Tests] Add non optional packages tests by @younesbelkada in #974
[DOCS] Fix outdated references to examples/ by @alvarobartt in #977
Update README.md by @GeekDream-x in #994
[DataCollatorForCompletionOnlyLM] Warn on identical eos_token_id and pad_token_id by @MustSave in #988
[DataCollatorForCompletionOnlyLM] Add more clarification / guidance in the case tokenizer.pad_token_id == tokenizer.eos_token_id by @younesbelkada in #992
make distributed true for multiple process by @allanj in #997
Fixed wrong trigger for warning by @zabealbe in #971
Update how_to_train.md by @halfrot in #1003
Adds requires_grad to input for non-quantized peft models by @younesbelkada in #1006
[Multi-Adapter PPO] Fix and Refactor reward model adapter by @mnoukhov in #982
Remove duplicate data loading in rl_training.py by @viethoangtranduong in #1020
[Document] Minor fixes of sft_trainer document by @mutichung in #1029
Update utils.py by @ZihanWang314 in #1012
spelling is hard by @grahamannett in #1043
Fixing accelerator version function call. by @ParthaEth in #1056
[SFT Trainer] precompute packed iterable into a dataset by @lvwerra in #979
Update doc CI by @lewtun in #1060
Improve PreTrainedModelWrapper._get_current_device by @billvsme in #1048
Update doc for the computer_metrics argument of SFTTrainer by @albertauyeung in #1062
[core] Fix failing tests on main by @younesbelkada in #1065
[SFTTrainer] Fix Trainer when args is None by @younesbelkada in #1064
enable multiple eval datasets by @peter-sk in #1052
Add missing loss_type in ValueError message by @alvarobartt in #1067
Add args to SFT example by @lewtun in #1079
add local folder support as input for rl_training. by @sywangyi in #1078
Make CI happy by @younesbelkada in #1080
Removing tyro in sft_llama2.py by @vwxyzjn in #1081
Log arg consistency by @tcapelle in #1084
Updated documentation for docs/source/reward_trainer.mdx to import th… by @cm2435 in #1092
[Feature] Add Ascend NPU accelerator support by @statelesshz in #1096
peft_module_casting_to_bf16 util method, append_concat_token flag, remove callback PeftSavingCallback by @pacman100 in #1110
Make prepending of bos token configurable. by @pacman100 in #1114
fix gradient checkpointing when using PEFT by @pacman100 in #1118
Update description in setup.py by @alvarobartt in #1101