07 Jun 13:20

b46716c

v0.4.2

QLoRA RLHF, SFT Trainer and RewardTrainer

A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes RewardTrainer and SFTTrainer to easily conduct your RLHF projects end-to-end!

Introducing `SFTTrainer` and `RewardTrainer`

Use the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!

[core] officially support SFT (Supervised Finetuning) by @younesbelkada in #323
[SFT] Fix sft issues by @younesbelkada in #336
[docs] fix SFT doc by @younesbelkada in #367
[core] Officially Support Reward Modeling by @younesbelkada in #303
Resolve broken evaluation/prediction for RewardTrainer by @tomaarsen in #404

QLoRA integration

Pass 4bit models directly into PPOTrainer for more memory efficient training

[core] Add 4bit QLora by @younesbelkada in #383
[bnb] fix 4 bit SFT by @younesbelkada in #396

Updated StackLlama example

Great work by @mnoukhov that managed to fix the issues related with StackLlama and the new versions of accelerate, peft and transformers. The completely reproducible examples below:

StackLLaMA: correctly merge peft model by @mnoukhov in #398
StackLlama: fixed RL training and added args by @mnoukhov in #400
Fixed some type annotations of trl.trainer.PPoTrainer by @JulesGM in #392
StackLLaMA: fix supervised finetuning and reward model training by @mnoukhov in #399

Bug fixes and improvements

[core] refactor peft API by @younesbelkada in #231
Batched generation by @lvwerra in #228
Reduce memory consumption in batched_forward_pass by @ohashi56225 in #234
[core] Add warning when negative KL by @younesbelkada in #239
adds early stopping by @edbeeching in #238
PPO config init is bloated by @GauravVirmani in #241
feat(ci): enable pip cache by @SauravMaheshkar in #198
Improve logging for PPO + Docs page by @natolambert in #243
Fix typo by @heya5 in #253
Using batched generate in sentiment scripts by @GauravVirmani in #249
[core] Fix DeepSpeed zero-3 issue by @younesbelkada in #182
[distributed] Fix early stopping and DP by @younesbelkada in #254
[core] Fix ds issue by @younesbelkada in #260
Add LlaMa in tests + create_reference_model by @younesbelkada in #261
Use active model to generate response in example on README (#269) by @rmill040 in #271
stack-llama by @edbeeching in #273
Adding pointer back to Meta's LLaMA. by @meg-huggingface in #277
fix doc string problem in ppo trainer loss function by @thuwyh in #279
Add LLaMA tutorial to docs by @natolambert in #278
Fix swapped helper texts by @philipp-classen in #284
fix typo in gpt2-sentiment.ipynb by @eltociear in #293
add functionality to push best models to the hub during training by @Bearnardd in #275
Small improvements / fixes to toxicity example by @natolambert in #266
Fix arguments description by @lvzii in #298
[t5] Fix negative kl issue by @younesbelkada in #262
Log Token distribution of Query / Response by @natolambert in #295
clean examples folder by @natolambert in #294
fixed typo in error message by @soerenarlt in #312
fix DS for peft ref_model in ppo trainer by @halfrot in #309
[CI] Fix broken tests by @younesbelkada in #318
[Docs] Add details on multi-GPU / multi-node by @younesbelkada in #320
Give a key to the wandb PPOConfig config entry by @JulesGM in #315
added doc for using torch.distributed.launch/run by @oroojlooy in #324
Fix argument's description by @vinhkhuc in #339
stack_llama: update instructions in README, fix broken _get_submodules and save tokenizer by @teticio in #358
stack_llama: add parameter to control max_length (to mitigate OOM errors) by @teticio in #359
[PPO] Relax negative KL constraint by @younesbelkada in #352
[PPOTrainer] Fix tensorboard issue by @younesbelkada in #330
140/best n sampling by @metric-space in #326
Fix bug when loading local peft model by @Opdoop in #342
add is_trainable in kwargs by @Opdoop in #363
Remove obsolete layer_norm_names parameter and add peft>=0.3.0 to requirements by @teticio in #366
Delete test_training.py by @younesbelkada in #371
[core] Fix warning issue by @younesbelkada in #377
Update customization.mdx by @binganao in #390
fix dataloader typo in ppo_trainer.py by @LZY-the-boys in #389
from_pretrain with peft adapter on the hub (# 379) by @glerzing in #380
keep state_dict kwargs instead of popping it in save_pretrained by @rizar in #393
Remove unused imports in docs. by @vwxyzjn in #406

New Contributors

@ohashi56225 made their first contribution in #234
@GauravVirmani made their first contribution in #241
@SauravMaheshkar made their first contribution in #198
@heya5 made their first contribution in #253
@rmill040 made their first contribution in #271
@thuwyh made their first contribution in #279
@philipp-classen made their first contribution in #284
@Bearnardd made their first contribution in #275
@lvzii made their first contribution in #298
@soerenarlt made their first contribution in #312
@halfrot made their first contribution in #309
@oroojlooy made their first contribution in #324
@vinhkhuc made their first contribution in #339
@teticio made their first contribution in #358
@metric-space made their first contribution in #326
@Opdoop made their first contribution in #342
@binganao made their first contribution in #390
@LZY-the-boys made their first contribution in #389
@glerzing made their first contribution in #380
@rizar made their first contribution in #393
@mnoukhov made their first contribution in #398
@tomaarsen made their first contribution in #404
@vwxyzjn made their first contribution in #406

Full Changelog: v0.4.1...v0.4.2

Contributors

rizar, vinhkhuc, and 28 other contributors

Assets 2

17 Mar 10:39

younesbelkada

v0.4.1

e618317

v0.4.1

Large models training, Naive Pipeline Parallelism, `peft` Data Parallelism support and distributed training bug fixes

This release includes a set of features and bug fixes to scale up your RLHF experiments for much larger models leveraging peft and bitsandbytes.

Naive Pipeline Parallelism support

Let's support naive Pipeline Parallelism by @younesbelkada in #210

We introduce a new paradigm in trl , termed as Naive Pipeline Parallelism, to fit large scale models on your training setup and apply RLHF on them. This feature uses peft to train adapters and bitsandbytes to reduce the memory foot print of your active model

`peft` Data Parallelism support

[peft] Fix DP issues by @younesbelkada in #221
[core] fix DP issue by @younesbelkada in #222

There were some bugs with respect to peft integration and DP. This release includes the bug fixes to enable multi-GPU training using accelerate + DDP (DIstributed Data Parallel)

Memory optimization

Your training runs can be now much more memory efficient thanks to few tricks / bug fixes:
Now PPOConfig also supports the flag optimize_cuda_cache (set to False by default) to avoid increasing CUDA memory issues

Grad accumulation and memory bugfix by @edbeeching in #220
adds a missing detach to the ratio by @edbeeching in #224

Pytorch 2.0 fixes

This release also includes minor fixes related to PyTorch 2.0 release

[test] attempt to fix CI test for PT 2.0 by @younesbelkada in #225

What's Changed

adds sentiment example for a 20b model by @edbeeching in #208
Update README.md blog post link by @TeamDman in #212
spell mistakes by @K-for-Code in #213
spell corrections by @K-for-Code in #214
Small changes when integrating into H4 by @natolambert in #216

New Contributors

@TeamDman made their first contribution in #212
@K-for-Code made their first contribution in #213

Full Changelog: v0.4.0...v0.4.1

Contributors

edbeeching, TeamDman, and 3 other contributors

Assets 2

09 Mar 11:38

younesbelkada

v0.4.0

c9a0a87

v0.4.0

`v0.4.0`: `peft` integration

Apply RLHF and fine-tune your favorite large model on consumer GPU using peft and trl ! Share also easily your trained RLHF adapters on the Hub with few lines of code

With this integration you can train gpt-neo-x (20B parameter model - 40GB in bfloat16) on a 24GB consumer GPU!

What's Changed

Allow running evaluate-toxicity with cpu by @jordimas in #195
[core] Fix quality issue by @younesbelkada in #197
Add 1.12.1 torch compatibility in sum method by @PanchenkoYehor in #190
peft integration by @edbeeching in #163
[core] Update dependency by @younesbelkada in #206

New Contributors

@PanchenkoYehor made their first contribution in #190

Full Changelog: v0.3.1...v0.4.0

Contributors

jordimas, edbeeching, and 2 other contributors

Assets 2

02 Mar 09:18

younesbelkada

v0.3.1

206bb1e

v0.3.1

What's Changed

Clarifications of acronyms and initialisms by @meg-huggingface in #185
Update detoxifying_a_lm.mdx by @younesbelkada in #186
Fix reference to example by @jordimas in #184

New Contributors

@meg-huggingface made their first contribution in #185
@jordimas made their first contribution in #184

Full Changelog: v0.3.0...v0.3.1

Contributors

jordimas, younesbelkada, and meg-huggingface

Assets 2

01 Mar 12:45

lvwerra

v0.3.0

1922f2a

v0.3.0

What's Changed

fix style, typos, license by @natolambert in #103
fix re-added file by @natolambert in #116
add citation by @natolambert in #124
add manual seeding for RL experiments by @natolambert in #118
add set_seed to init.py by @lvwerra in #127
update docs with Seq2seq models, set_seed, and create_reference_model by @lvwerra in #128
[bug] Update gpt2-sentiment.py by @younesbelkada in #132
Fix Sentiment control notebook by @lvwerra in #126
realign values by @lvwerra in #137
Change unclear variables & fix typos by @natolambert in #134
Feat/reward summarization example by @TristanThrush in #115
[core] Small refactor of forward pass by @younesbelkada in #136
[tests] Add correct repo name by @younesbelkada in #138
fix forward batching for seq2seq and right padding models. by @lvwerra in #139
fix bug in batched_forward_pass by @ArvinZhuang in #144
[core] Add torch_dtype support by @younesbelkada in #147
[core] Fix dataloader issue by @younesbelkada in #154
[core] enable bf16 training by @younesbelkada in #156
[core] fix saving multi-gpu by @younesbelkada in #157
Added imports by @BirgerMoell in #159
Add CITATION.cff by @kashif in #169
[Doc] Add how to use Lion optimizer by @younesbelkada in #152
policy kl [old | new] by @kashif in #168
add minibatching by @lvwerra in #153
fix bugs in tutorial by @shizhediao in #175
[core] Add max_grad_norm support by @younesbelkada in #177
Add toxcitiy example by @younesbelkada in #162
[Docs] Fix barplot by @younesbelkada in #181

New Contributors

@natolambert made their first contribution in #103
@ArvinZhuang made their first contribution in #144
@BirgerMoell made their first contribution in #159
@kashif made their first contribution in #169
@shizhediao made their first contribution in #175

Full Changelog: v0.2.1...v0.3.0

Contributors

kashif, BirgerMoell, and 6 other contributors

Assets 2

25 Jan 16:09

lvwerra

v0.2.1

c286f60

v0.2.1

What's Changed

Update customization.mdx by @younesbelkada in #109
add datasets as a dependancy by @lvwerra in #110
[Docs] Add hlinks to scripts & notebooks by @younesbelkada in #111
Fix Mapping in core for Python 3.10 by @lvwerra in #112

Full Changelog: v0.2.0...v0.2.1

Contributors

lvwerra and younesbelkada

Assets 2

25 Jan 14:04

lvwerra

v0.2.0

4d810e0

v0.2.0

Highlights

General decoder model support in addition to GPT-2 in #53
Encoder-decoder model support (such as T5) in #93
New, shiny docs with the doc-builder in #59
push_to_hub with PPOTrainer in #68
Simple reference model creation with layer sharing in #61

What's Changed

Remove nbdev dependency by @younesbelkada in #52
Adds github actions and dummy test by @edbeeching in #55
Update README.md by @Keith-Hon in #51
Update README.md by @TristanThrush in #49
Adds Python highlighting to the code block by @JulesGM in #45
xxxForCausalLM support by @younesbelkada in #53
[VHead] Fix slow convergence issue by @younesbelkada in #60
add docbuilder skeleton by @lvwerra in #59
fix docs workflow by @lvwerra in #63
accelerate integration by @younesbelkada in #58
add create_reference_model by @lvwerra in #61
Improve Makefile and code quality by @lvwerra in #62
Relax requirements by @lvwerra in #66
modeling - change namings by @younesbelkada in #65
[PPOTrainer] make the reference model optional by @younesbelkada in #67
Improvements 1a by @edbeeching in #70
update GitHub actions to main by @lvwerra in #77
[core] refactor step method by @younesbelkada in #76
[PPOTrainer] Support generic optimizers by @younesbelkada in #78
Update sentiment_tuning.mdx by @eltociear in #69
Remove references to "listify_batch" by @xiaoyesoso in #81
Collater -> collator by @LysandreJik in #88
Model as kwarg in pipeline by @LysandreJik in #89
Small typo correction by @LysandreJik in #87
[API] Make dataset attribute optional by @younesbelkada in #85
[Doc] Improve docs by @younesbelkada in #91
[core] Push v_head when using AutoModelForCausalLMWithValueHead by @younesbelkada in #86
[core] remove wandb dependency by @younesbelkada in #92
add logo by @lvwerra in #95
Encoder-Decoder models support by @younesbelkada in #93
Fix docs hyperlinks by @lewtun in #98
[API] LR scheduler support by @younesbelkada in #96
Version should have dev0 unless it is a release version by @mishig25 in #99
[core] improve API by @younesbelkada in #97
Add push to Hub for PPOTrainer by @lewtun in #68
[core] Advise to use fbs=1 by @younesbelkada in #102
[Doc] New additions by @younesbelkada in #105
restructure examples by @lvwerra in #107
Fix nits & missing things by @younesbelkada in #108
Convert notebook 05 by @edbeeching in #80

New Contributors

@lvwerra made their first contribution in #2
@vblagoje made their first contribution in #16
@dependabot made their first contribution in #26
@younesbelkada made their first contribution in #52
@edbeeching made their first contribution in #55
@Keith-Hon made their first contribution in #51
@TristanThrush made their first contribution in #49
@JulesGM made their first contribution in #45
@eltociear made their first contribution in #69
@xiaoyesoso made their first contribution in #81
@LysandreJik made their first contribution in #88
@lewtun made their first contribution in #98
@mishig25 made their first contribution in #99

Full Changelog: https://github.com/lvwerra/trl/commits/v0.2.0

Contributors

vblagoje, JulesGM, and 11 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QLoRA RLHF, SFT Trainer and RewardTrainer

Introducing `SFTTrainer` and `RewardTrainer`

QLoRA integration

Updated StackLlama example

Bug fixes and improvements

New Contributors

Contributors

Large models training, Naive Pipeline Parallelism, `peft` Data Parallelism support and distributed training bug fixes

Naive Pipeline Parallelism support

`peft` Data Parallelism support

Memory optimization

Pytorch 2.0 fixes

What's Changed

New Contributors

Contributors

`v0.4.0`: `peft` integration

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Highlights

What's Changed

New Contributors

Contributors

Releases: huggingface/trl

v0.4.2

QLoRA RLHF, SFT Trainer and RewardTrainer

Introducing SFTTrainer and RewardTrainer

QLoRA integration

Updated StackLlama example

Bug fixes and improvements

New Contributors

Contributors

v0.4.1

Large models training, Naive Pipeline Parallelism, peft Data Parallelism support and distributed training bug fixes

Naive Pipeline Parallelism support

peft Data Parallelism support

Memory optimization

Pytorch 2.0 fixes

What's Changed

New Contributors

Contributors

v0.4.0

v0.4.0: peft integration

What's Changed

New Contributors

Contributors

v0.3.1

What's Changed

New Contributors

Contributors

v0.3.0

What's Changed

New Contributors

Contributors

v0.2.1

What's Changed

Contributors

v0.2.0

Highlights

What's Changed

New Contributors

Contributors

Introducing `SFTTrainer` and `RewardTrainer`

Large models training, Naive Pipeline Parallelism, `peft` Data Parallelism support and distributed training bug fixes

`peft` Data Parallelism support

`v0.4.0`: `peft` integration