Skip to content

v0.4.2

Compare
Choose a tag to compare
@younesbelkada younesbelkada released this 07 Jun 13:20
· 951 commits to main since this release

QLoRA RLHF, SFT Trainer and RewardTrainer

A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes RewardTrainer and SFTTrainer to easily conduct your RLHF projects end-to-end!

Introducing SFTTrainer and RewardTrainer

Use the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!

QLoRA integration

Pass 4bit models directly into PPOTrainer for more memory efficient training

Updated StackLlama example

Great work by @mnoukhov that managed to fix the issues related with StackLlama and the new versions of accelerate, peft and transformers. The completely reproducible examples below:

  • StackLLaMA: correctly merge peft model by @mnoukhov in #398
  • StackLlama: fixed RL training and added args by @mnoukhov in #400
  • Fixed some type annotations of trl.trainer.PPoTrainer by @JulesGM in #392
  • StackLLaMA: fix supervised finetuning and reward model training by @mnoukhov in #399

Bug fixes and improvements

New Contributors

Full Changelog: v0.4.1...v0.4.2