[`PPOTrainer`] Support generic optimizers #78

younesbelkada · 2023-01-05T11:23:00Z

This PR adds the support of generic optimizers. Before this PR, the PPOTrainer was only support adam optimizer. Users are now free to use any optimizer.
Added also an example that leverages 8bitAdam which is lighter & faster than classic Adam optimizer.

cc @lewtun @lvwerra @edbeeching

as a side note, 8bitAdam should support DP out of the box

- add generic support - add bnb example

HuggingFaceDocBuilderDev · 2023-01-05T11:26:38Z

The documentation is not available anymore as the PR was closed or merged.

younesbelkada · 2023-01-05T11:31:38Z

Regarding 8-bit Adam, it is quite hard to make it converge. I have found that the model falls rapidly in a collapse mode: https://wandb.ai/distill-bloom/trl/runs/k7vogzao?workspace=user-younesbelkada let me know if it still makes sense to add the example

…port

lvwerra

Looks good to me, just a small nit. Do you want add the scheduler here or in a new PR?

trl/trainer/ppo_trainer.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

younesbelkada · 2023-01-05T14:05:24Z

Thanks! Let's address the scheduler in a follow up PR!

LouisCastricato · 2023-01-08T14:12:46Z

@younesbelkada FYI, 8bit adam converges only after you do a lot of stuff with reward normalization. CarperAI/trlx#53 see here. We also had significant issues getting it working. There was also a recent bug in computing values that we found that I believe was carried over from TRL, I'll have to double check with one of my engineers on this.

LouisCastricato · 2023-01-08T14:18:38Z

Nevermind, it appears like the bug is a non issue for TRL.

v1

53ab14a

- add generic support - add bnb example

younesbelkada added 3 commits January 5, 2023 11:33

adapt LR

2f529bd

Merge remote-tracking branch 'origin/master' into add-generic-opt-sup…

216e97c

…port

add tests

b711fd5

younesbelkada requested a review from lvwerra January 5, 2023 11:47

younesbelkada mentioned this pull request Jan 5, 2023

Roadmap - trl 0.2 #64

Closed

26 tasks

lvwerra reviewed Jan 5, 2023

View reviewed changes

trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved

Update trl/trainer/ppo_trainer.py

b111b23

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

lvwerra approved these changes Jan 5, 2023

View reviewed changes

younesbelkada merged commit 6c5f278 into huggingface:main Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`PPOTrainer`] Support generic optimizers #78

[`PPOTrainer`] Support generic optimizers #78

younesbelkada commented Jan 5, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 5, 2023 •

edited

Loading

younesbelkada commented Jan 5, 2023

lvwerra left a comment

younesbelkada commented Jan 5, 2023 •

edited

Loading

LouisCastricato commented Jan 8, 2023 •

edited

Loading

LouisCastricato commented Jan 8, 2023

[PPOTrainer] Support generic optimizers #78

[PPOTrainer] Support generic optimizers #78

Conversation

younesbelkada commented Jan 5, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Jan 5, 2023 • edited Loading

younesbelkada commented Jan 5, 2023

lvwerra left a comment

Choose a reason for hiding this comment

younesbelkada commented Jan 5, 2023 • edited Loading

LouisCastricato commented Jan 8, 2023 • edited Loading

LouisCastricato commented Jan 8, 2023

[`PPOTrainer`] Support generic optimizers #78

[`PPOTrainer`] Support generic optimizers #78

younesbelkada commented Jan 5, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 5, 2023 •

edited

Loading

younesbelkada commented Jan 5, 2023 •

edited

Loading

LouisCastricato commented Jan 8, 2023 •

edited

Loading