-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Sentiment control notebook #126
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
@@ -296,8 +296,7 @@ | |||
{ | |||
"data": { | |||
"text/html": [ | |||
"wandb version 0.13.9 is available! To upgrade, please run:\n", | |||
" $ pip install wandb --upgrade" | |||
"Tracking run with wandb version 0.13.9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the right way to dev with notebooks and avoid some things like this?
@@ -309,7 +308,7 @@ | |||
{ | |||
"data": { | |||
"text/html": [ | |||
"Tracking run with wandb version 0.13.7" | |||
"Run data is saved locally in <code>/home/leandro_huggingface_co/trl/examples/sentiment/notebooks/wandb/run-20230206_125743-jpcnr7jx</code>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -647,7 +673,8 @@ | |||
" \"top_p\": 1.0,\n", | |||
" \"do_sample\": True,\n", | |||
" \"pad_token_id\": gpt2_tokenizer.eos_token_id,\n", | |||
" \"max_new_tokens\": txt_out_len\n", | |||
" \"max_new_tokens\": txt_out_len,\n", | |||
" \"eos_token_id\": -1\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only important line, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small review, asking about best way to maintain notebooks.
Should we use the NB Review tool for github that is used in huggingface/notebooks?
We could also move the notebooks there?
Thanks @natolambert - I actually installed NB Review yesterday but I think we need to open a new PR for it to show the link :) https://app.reviewnb.com/lvwerra/trl/pull/126/ |
@lvwerra it seems that if we don't set For example, this is a output generated by a RL-trained T5 model with This is the output for the same input example by a T5 model RL-trained without Did you figure it out why T5 model learns to generate shorter inputs with PPO? |
Hi @leoribeiro ! I'm struggling with this issue when training on a very long context summarization task. Did you find any workarounds? |
This is an attempt to fix the instabilities in the controlled sentiment generation.
Changes so far:
"eos_token_id": -1
The last point makes sure that the model keeps generating until the max new tokens is reached. I think what happens otherwise is that sometimes the model only generates 1-2 tokens in which case the PPO loss spikes (don't know why, yet).
The last successful run is here.