Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Sentiment control notebook #126

Merged
merged 3 commits into from
Feb 7, 2023
Merged

Fix Sentiment control notebook #126

merged 3 commits into from
Feb 7, 2023

Conversation

lvwerra
Copy link
Member

@lvwerra lvwerra commented Jan 31, 2023

This is an attempt to fix the instabilities in the controlled sentiment generation.

Changes so far:

  • fix logit computation for reward (pipeline changes order natively which can cause the position of the logit to switch)
  • In generate kwargs add "eos_token_id": -1

The last point makes sure that the model keeps generating until the max new tokens is reached. I think what happens otherwise is that sometimes the model only generates 1-2 tokens in which case the PPO loss spikes (don't know why, yet).

The last successful run is here.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 31, 2023

The documentation is not available anymore as the PR was closed or merged.

@lvwerra lvwerra mentioned this pull request Feb 3, 2023
@lvwerra lvwerra requested a review from younesbelkada February 6, 2023 18:26
Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@@ -296,8 +296,7 @@
{
"data": {
"text/html": [
"wandb version 0.13.9 is available! To upgrade, please run:\n",
" $ pip install wandb --upgrade"
"Tracking run with wandb version 0.13.9"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the right way to dev with notebooks and avoid some things like this?

@@ -309,7 +308,7 @@
{
"data": {
"text/html": [
"Tracking run with wandb version 0.13.7"
"Run data is saved locally in <code>/home/leandro_huggingface_co/trl/examples/sentiment/notebooks/wandb/run-20230206_125743-jpcnr7jx</code>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -647,7 +673,8 @@
" \"top_p\": 1.0,\n",
" \"do_sample\": True,\n",
" \"pad_token_id\": gpt2_tokenizer.eos_token_id,\n",
" \"max_new_tokens\": txt_out_len\n",
" \"max_new_tokens\": txt_out_len,\n",
" \"eos_token_id\": -1\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only important line, right?

Copy link
Contributor

@natolambert natolambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small review, asking about best way to maintain notebooks.
Should we use the NB Review tool for github that is used in huggingface/notebooks?
We could also move the notebooks there?

@lvwerra
Copy link
Member Author

lvwerra commented Feb 7, 2023

Thanks @natolambert - I actually installed NB Review yesterday but I think we need to open a new PR for it to show the link :) https://app.reviewnb.com/lvwerra/trl/pull/126/

@lvwerra lvwerra merged commit 1ad198f into main Feb 7, 2023
@lvwerra lvwerra deleted the fix-control-nb branch February 7, 2023 09:03
@leoribeiro
Copy link

@lvwerra it seems that if we don't set "eos_token_id": -1 the model learn gradually to generate shorter outputs. I think setting "eos_token_id": -1 is not desirable because it makes the model to generate non grammatical text because it forces the generation until the max new tokens is reached. In that way, the model is not useful.

For example, this is a output generated by a RL-trained T5 model with "eos_token_id": -1 for sampling:
Three diaspora-based, prominent African activists discuss the current state of women's rights in the continent. In 2010, the African Union launched a decade-long initiative to promote women's empowerment. In 2010, the African Union launched a decade-long initiative to promote women's capabilities. Organizer Paul Valdais has been named "Inspirational Woman of 2012" by the UK group "Women 4 Africa"TIME: MK:'We really hope to be getting focus on justice at the bottom of the media., despite the

This is the output for the same input example by a T5 model RL-trained without "eos_token_id": -1 for sampling:
African Voices talks with three diaspora-based African activist.

Did you figure it out why T5 model learns to generate shorter inputs with PPO?

@aliwalker
Copy link

@lvwerra it seems that if we don't set "eos_token_id": -1 the model learn gradually to generate shorter outputs. I think setting "eos_token_id": -1 is not desirable because it makes the model to generate non grammatical text because it forces the generation until the max new tokens is reached. In that way, the model is not useful.

For example, this is a output generated by a RL-trained T5 model with "eos_token_id": -1 for sampling: Three diaspora-based, prominent African activists discuss the current state of women's rights in the continent. In 2010, the African Union launched a decade-long initiative to promote women's empowerment. In 2010, the African Union launched a decade-long initiative to promote women's capabilities. Organizer Paul Valdais has been named "Inspirational Woman of 2012" by the UK group "Women 4 Africa"TIME: MK:'We really hope to be getting focus on justice at the bottom of the media., despite the

This is the output for the same input example by a T5 model RL-trained without "eos_token_id": -1 for sampling: African Voices talks with three diaspora-based African activist.

Did you figure it out why T5 model learns to generate shorter inputs with PPO?

Hi @leoribeiro ! I'm struggling with this issue when training on a very long context summarization task. Did you find any workarounds?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants