-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. #1171
Conversation
The usage of the variable “epoch” is misleading in the original Doc, the dataloader does not contain the data for ALL epochs, but 1 only, thus "for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader))" is misleading and does not actually stores the epoch #. The correct version comes from the TRL PPO notebook tutorial (https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment-control.ipynb), which uses an outer loop to capture the epochs. I posted also the question on forum: https://discuss.huggingface.co/t/confusing-and-possibly-misleading-ppo-trainer-code-from-trl-api-doc-tutorial/67531
docs/source/ppo_trainer.mdx
Outdated
stats = ppo_trainer.step(query_tensors, response_tensors, rewards) | ||
ppo_trainer.log_stats(stats, batch, rewards) | ||
for epoch in tqdm(range(ppo_trainer.config.ppo_epochs), "epoch: "): | ||
for batch_id, batch in tqdm(enumerate(ppo_trainer.dataloader)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we don't use the batch_id
we could remove the enumerate
and batch_id
altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Yes, I removed the batch_id.
Sounds good to me. Note that the reason people referred to this part as an |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying the docs!
…r Doc. (huggingface#1171) * Fix misleading variable "epoch" from PPOTrainer Doc. The usage of the variable “epoch” is misleading in the original Doc, the dataloader does not contain the data for ALL epochs, but 1 only, thus "for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader))" is misleading and does not actually stores the epoch #. The correct version comes from the TRL PPO notebook tutorial (https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment-control.ipynb), which uses an outer loop to capture the epochs. I posted also the question on forum: https://discuss.huggingface.co/t/confusing-and-possibly-misleading-ppo-trainer-code-from-trl-api-doc-tutorial/67531 * Remove batch_id
…r Doc. (huggingface#1171) * Fix misleading variable "epoch" from PPOTrainer Doc. The usage of the variable “epoch” is misleading in the original Doc, the dataloader does not contain the data for ALL epochs, but 1 only, thus "for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader))" is misleading and does not actually stores the epoch #. The correct version comes from the TRL PPO notebook tutorial (https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment-control.ipynb), which uses an outer loop to capture the epochs. I posted also the question on forum: https://discuss.huggingface.co/t/confusing-and-possibly-misleading-ppo-trainer-code-from-trl-api-doc-tutorial/67531 * Remove batch_id
The usage of the variable “epoch” is misleading (incorrect) in the original Doc:
the dataloader does not contain the data for ALL epochs, but 1 epoch only, thus
"for epoch, batch in tqdm(enumerate(ppo_trainer.dataloader))"
is misleading and does not actually stores the epoch #.The correct version comes from the TRL PPO notebook tutorial (https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment-control.ipynb), which uses an outer loop to capture the epochs.
I posted also the question on forum: https://discuss.huggingface.co/t/confusing-and-possibly-misleading-ppo-trainer-code-from-trl-api-doc-tutorial/67531
![image](https://private-user-images.githubusercontent.com/67591670/293775114-c13f5c86-e954-49fa-8ece-9655deb538fd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDAwMDE0MjMsIm5iZiI6MTc0MDAwMTEyMywicGF0aCI6Ii82NzU5MTY3MC8yOTM3NzUxMTQtYzEzZjVjODYtZTk1NC00OWZhLThlY2UtOTY1NWRlYjUzOGZkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE5VDIxMzg0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTUxZGQxOGU5MTI4YjUzZTYyODBjZWFlY2I2OTk3MjkzNmNmOWIwNGE0NmYzOWMyMGVkOWQ0NWEzNWVjNDRlZjMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0._6m7kTLOChSWG8c0BK27I92yARP-xt9rwcM8UhWwCF0)