Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. #1171

Jfhseh · 2024-01-02T19:43:10Z

The usage of the variable “epoch” is misleading (incorrect) in the original Doc:
the dataloader does not contain the data for ALL epochs, but 1 epoch only, thus
"for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader))" is misleading and does not actually stores the epoch #.

The correct version comes from the TRL PPO notebook tutorial (https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment-control.ipynb), which uses an outer loop to capture the epochs.

I posted also the question on forum: https://discuss.huggingface.co/t/confusing-and-possibly-misleading-ppo-trainer-code-from-trl-api-doc-tutorial/67531

The usage of the variable “epoch” is misleading in the original Doc, the dataloader does not contain the data for ALL epochs, but 1 only, thus "for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader))" is misleading and does not actually stores the epoch #. The correct version comes from the TRL PPO notebook tutorial (https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment-control.ipynb), which uses an outer loop to capture the epochs. I posted also the question on forum: https://discuss.huggingface.co/t/confusing-and-possibly-misleading-ppo-trainer-code-from-trl-api-doc-tutorial/67531

lvwerra · 2024-01-04T15:08:08Z

docs/source/ppo_trainer.mdx

-    stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
-    ppo_trainer.log_stats(stats, batch, rewards)
+for epoch in tqdm(range(ppo_trainer.config.ppo_epochs), "epoch: "):
+    for batch_id, batch in tqdm(enumerate(ppo_trainer.dataloader)): 


since we don't use the batch_id we could remove the enumerate and batch_id altogether.

Got it. Yes, I removed the batch_id.

lvwerra · 2024-01-04T15:09:06Z

Sounds good to me. Note that the reason people referred to this part as an epoch is because it constitutes a PPO optimization epoch. But I agree that it's a bit confusing. Left a minor comment.

HuggingFaceDocBuilderDev · 2024-01-04T15:12:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada

Thanks for clarifying the docs!

…r Doc. (huggingface#1171) * Fix misleading variable "epoch" from PPOTrainer Doc. The usage of the variable “epoch” is misleading in the original Doc, the dataloader does not contain the data for ALL epochs, but 1 only, thus "for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader))" is misleading and does not actually stores the epoch #. The correct version comes from the TRL PPO notebook tutorial (https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment-control.ipynb), which uses an outer loop to capture the epochs. I posted also the question on forum: https://discuss.huggingface.co/t/confusing-and-possibly-misleading-ppo-trainer-code-from-trl-api-doc-tutorial/67531 * Remove batch_id

lvwerra reviewed Jan 4, 2024

View reviewed changes

Remove batch_id

ea03f1f

lvwerra approved these changes Jan 5, 2024

View reviewed changes

younesbelkada approved these changes Jan 8, 2024

View reviewed changes

younesbelkada merged commit ad597db into huggingface:main Jan 8, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. #1171

Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. #1171

Jfhseh commented Jan 2, 2024 •

edited

Loading

lvwerra Jan 4, 2024

Jfhseh Jan 4, 2024

lvwerra commented Jan 4, 2024

HuggingFaceDocBuilderDev commented Jan 4, 2024

younesbelkada left a comment

Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. #1171

Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. #1171

Conversation

Jfhseh commented Jan 2, 2024 • edited Loading

lvwerra Jan 4, 2024

Choose a reason for hiding this comment

Jfhseh Jan 4, 2024

Choose a reason for hiding this comment

lvwerra commented Jan 4, 2024

HuggingFaceDocBuilderDev commented Jan 4, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

Jfhseh commented Jan 2, 2024 •

edited

Loading