Fix Sentiment control notebook #126

lvwerra · 2023-01-31T15:50:15Z

This is an attempt to fix the instabilities in the controlled sentiment generation.

Changes so far:

fix logit computation for reward (pipeline changes order natively which can cause the position of the logit to switch)
In generate kwargs add "eos_token_id": -1

The last point makes sure that the model keeps generating until the max new tokens is reached. I think what happens otherwise is that sometimes the model only generates 1-2 tokens in which case the PPO loss spikes (don't know why, yet).

The last successful run is here.

HuggingFaceDocBuilderDev · 2023-01-31T15:53:31Z

The documentation is not available anymore as the PR was closed or merged.

younesbelkada

Thanks for fixing!

natolambert · 2023-02-07T01:25:39Z

examples/sentiment/notebooks/gpt2-sentiment-control.ipynb

@@ -296,8 +296,7 @@
    {
     "data": {
      "text/html": [
-       "wandb version 0.13.9 is available!  To upgrade, please run:\n",
-       " $ pip install wandb --upgrade"
+       "Tracking run with wandb version 0.13.9"


What's the right way to dev with notebooks and avoid some things like this?

natolambert · 2023-02-07T01:25:43Z

examples/sentiment/notebooks/gpt2-sentiment-control.ipynb

@@ -309,7 +308,7 @@
    {
     "data": {
      "text/html": [
-       "Tracking run with wandb version 0.13.7"
+       "Run data is saved locally in <code>/home/leandro_huggingface_co/trl/examples/sentiment/notebooks/wandb/run-20230206_125743-jpcnr7jx</code>"


same as above

natolambert · 2023-02-07T01:26:11Z

examples/sentiment/notebooks/gpt2-sentiment-control.ipynb

@@ -647,7 +673,8 @@
    "    \"top_p\": 1.0,\n",
    "    \"do_sample\": True,\n",
    "    \"pad_token_id\": gpt2_tokenizer.eos_token_id,\n",
-    "    \"max_new_tokens\": txt_out_len\n",
+    "    \"max_new_tokens\": txt_out_len,\n",
+    "    \"eos_token_id\": -1\n",


This is the only important line, right?

natolambert

Small review, asking about best way to maintain notebooks.
Should we use the NB Review tool for github that is used in huggingface/notebooks?
We could also move the notebooks there?

lvwerra · 2023-02-07T09:03:10Z

Thanks @natolambert - I actually installed NB Review yesterday but I think we need to open a new PR for it to show the link :) https://app.reviewnb.com/lvwerra/trl/pull/126/

leoribeiro · 2023-03-22T21:15:21Z

@lvwerra it seems that if we don't set "eos_token_id": -1 the model learn gradually to generate shorter outputs. I think setting "eos_token_id": -1 is not desirable because it makes the model to generate non grammatical text because it forces the generation until the max new tokens is reached. In that way, the model is not useful.

For example, this is a output generated by a RL-trained T5 model with "eos_token_id": -1 for sampling:
Three diaspora-based, prominent African activists discuss the current state of women's rights in the continent. In 2010, the African Union launched a decade-long initiative to promote women's empowerment. In 2010, the African Union launched a decade-long initiative to promote women's capabilities. Organizer Paul Valdais has been named "Inspirational Woman of 2012" by the UK group "Women 4 Africa"TIME: MK:'We really hope to be getting focus on justice at the bottom of the media., despite the

This is the output for the same input example by a T5 model RL-trained without "eos_token_id": -1 for sampling:
African Voices talks with three diaspora-based African activist.

Did you figure it out why T5 model learns to generate shorter inputs with PPO?

aliwalker · 2023-10-13T10:53:10Z

@lvwerra it seems that if we don't set "eos_token_id": -1 the model learn gradually to generate shorter outputs. I think setting "eos_token_id": -1 is not desirable because it makes the model to generate non grammatical text because it forces the generation until the max new tokens is reached. In that way, the model is not useful.

For example, this is a output generated by a RL-trained T5 model with "eos_token_id": -1 for sampling: Three diaspora-based, prominent African activists discuss the current state of women's rights in the continent. In 2010, the African Union launched a decade-long initiative to promote women's empowerment. In 2010, the African Union launched a decade-long initiative to promote women's capabilities. Organizer Paul Valdais has been named "Inspirational Woman of 2012" by the UK group "Women 4 Africa"TIME: MK:'We really hope to be getting focus on justice at the bottom of the media., despite the

This is the output for the same input example by a T5 model RL-trained without "eos_token_id": -1 for sampling: African Voices talks with three diaspora-based African activist.

Did you figure it out why T5 model learns to generate shorter inputs with PPO?

Hi @leoribeiro ! I'm struggling with this issue when training on a very long context summarization task. Did you find any workarounds?

fix logits

8c220c3

add per control token reward logging

5e1a5dc

lvwerra mentioned this pull request Feb 3, 2023

Spikes in PPO policy loss #101

Closed

add

2ef369e

lvwerra requested a review from younesbelkada February 6, 2023 18:26

younesbelkada approved these changes Feb 6, 2023

View reviewed changes

natolambert mentioned this pull request Feb 6, 2023

Entropy Regularization in PPOTrainer #131

Closed

natolambert reviewed Feb 7, 2023

View reviewed changes

lvwerra merged commit 1ad198f into main Feb 7, 2023

lvwerra deleted the fix-control-nb branch February 7, 2023 09:03

This was referenced Feb 7, 2023

Ordering of output in sentiment_pipe function #117

Closed

on reward definition on sentiment control example #120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Sentiment control notebook #126

Fix Sentiment control notebook #126

lvwerra commented Jan 31, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 31, 2023 •

edited

Loading

younesbelkada left a comment

natolambert Feb 7, 2023

natolambert Feb 7, 2023

natolambert Feb 7, 2023

natolambert left a comment

lvwerra commented Feb 7, 2023

leoribeiro commented Mar 22, 2023

aliwalker commented Oct 13, 2023

Fix Sentiment control notebook #126

Fix Sentiment control notebook #126

Conversation

lvwerra commented Jan 31, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Jan 31, 2023 • edited Loading

younesbelkada left a comment

Choose a reason for hiding this comment

natolambert Feb 7, 2023

Choose a reason for hiding this comment

natolambert Feb 7, 2023

Choose a reason for hiding this comment

natolambert Feb 7, 2023

Choose a reason for hiding this comment

natolambert left a comment

Choose a reason for hiding this comment

lvwerra commented Feb 7, 2023

leoribeiro commented Mar 22, 2023

aliwalker commented Oct 13, 2023

lvwerra commented Jan 31, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 31, 2023 •

edited

Loading