Skip to content

Releases: huggingface/trl

v0.4.2

07 Jun 13:20
Compare
Choose a tag to compare

QLoRA RLHF, SFT Trainer and RewardTrainer

A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes RewardTrainer and SFTTrainer to easily conduct your RLHF projects end-to-end!

Introducing SFTTrainer and RewardTrainer

Use the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!

QLoRA integration

Pass 4bit models directly into PPOTrainer for more memory efficient training

Updated StackLlama example

Great work by @mnoukhov that managed to fix the issues related with StackLlama and the new versions of accelerate, peft and transformers. The completely reproducible examples below:

  • StackLLaMA: correctly merge peft model by @mnoukhov in #398
  • StackLlama: fixed RL training and added args by @mnoukhov in #400
  • Fixed some type annotations of trl.trainer.PPoTrainer by @JulesGM in #392
  • StackLLaMA: fix supervised finetuning and reward model training by @mnoukhov in #399

Bug fixes and improvements

New Contributors

Full Changelog: v0.4.1...v0.4.2

v0.4.1

17 Mar 10:39
Compare
Choose a tag to compare

Large models training, Naive Pipeline Parallelism, peft Data Parallelism support and distributed training bug fixes

This release includes a set of features and bug fixes to scale up your RLHF experiments for much larger models leveraging peft and bitsandbytes.

Naive Pipeline Parallelism support

We introduce a new paradigm in trl , termed as Naive Pipeline Parallelism, to fit large scale models on your training setup and apply RLHF on them. This feature uses peft to train adapters and bitsandbytes to reduce the memory foot print of your active model

image

peft Data Parallelism support

There were some bugs with respect to peft integration and DP. This release includes the bug fixes to enable multi-GPU training using accelerate + DDP (DIstributed Data Parallel)

Memory optimization

Your training runs can be now much more memory efficient thanks to few tricks / bug fixes:
Now PPOConfig also supports the flag optimize_cuda_cache (set to False by default) to avoid increasing CUDA memory issues

Pytorch 2.0 fixes

This release also includes minor fixes related to PyTorch 2.0 release

What's Changed

New Contributors

Full Changelog: v0.4.0...v0.4.1

v0.4.0

09 Mar 11:38
Compare
Choose a tag to compare

v0.4.0: peft integration

Apply RLHF and fine-tune your favorite large model on consumer GPU using peft and trl ! Share also easily your trained RLHF adapters on the Hub with few lines of code

With this integration you can train gpt-neo-x (20B parameter model - 40GB in bfloat16) on a 24GB consumer GPU!

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.4.0

v0.3.1

02 Mar 09:18
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.3.1

v0.3.0

01 Mar 12:45
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.3.0

v0.2.1

25 Jan 16:09
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.2.1

v0.2.0

25 Jan 14:04
Compare
Choose a tag to compare

Highlights

  • General decoder model support in addition to GPT-2 in #53
  • Encoder-decoder model support (such as T5) in #93
  • New, shiny docs with the doc-builder in #59
  • push_to_hub with PPOTrainer in #68
  • Simple reference model creation with layer sharing in #61

What's Changed

New Contributors

Full Changelog: https://github.com/lvwerra/trl/commits/v0.2.0