Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PPOTrainer] Support generic optimizers #78

Merged

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Jan 5, 2023

This PR adds the support of generic optimizers. Before this PR, the PPOTrainer was only support adam optimizer. Users are now free to use any optimizer.
Added also an example that leverages 8bitAdam which is lighter & faster than classic Adam optimizer.

cc @lewtun @lvwerra @edbeeching

as a side note, 8bitAdam should support DP out of the box

- add generic support
- add bnb example
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 5, 2023

The documentation is not available anymore as the PR was closed or merged.

@younesbelkada
Copy link
Contributor Author

Regarding 8-bit Adam, it is quite hard to make it converge. I have found that the model falls rapidly in a collapse mode: https://wandb.ai/distill-bloom/trl/runs/k7vogzao?workspace=user-younesbelkada let me know if it still makes sense to add the example

@younesbelkada younesbelkada requested a review from lvwerra January 5, 2023 11:47
@younesbelkada younesbelkada mentioned this pull request Jan 5, 2023
26 tasks
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just a small nit. Do you want add the scheduler here or in a new PR?

trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
@younesbelkada
Copy link
Contributor Author

younesbelkada commented Jan 5, 2023

Thanks! Let's address the scheduler in a follow up PR!

@younesbelkada younesbelkada merged commit 6c5f278 into huggingface:main Jan 5, 2023
@LouisCastricato
Copy link

LouisCastricato commented Jan 8, 2023

@younesbelkada FYI, 8bit adam converges only after you do a lot of stuff with reward normalization. CarperAI/trlx#53 see here. We also had significant issues getting it working. There was also a recent bug in computing values that we found that I believe was carried over from TRL, I'll have to double check with one of my engineers on this.

@LouisCastricato
Copy link

Nevermind, it appears like the bug is a non issue for TRL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants