🧮 Fix `max_steps` calculation in `RLOOTrainer` #2433

qgallouedec · 2024-12-03T19:17:39Z

What does this PR do?

I'm not entirely sure why it works but the problem is gone for varying batch size, rloo k, world size, grad accumulation step.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-12-03T19:22:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

kashif · 2024-12-03T19:24:15Z

is it because we have effectively 2 x bigger batches from the chosen and rejected pairs?

qgallouedec · 2024-12-03T20:31:15Z

Probably..., merging as it seems to resolve the issue.

Update max_steps calculation in RLOOTrainer

c600a92

qgallouedec requested review from kashif and lewtun December 3, 2024 19:17

kashif approved these changes Dec 3, 2024

View reviewed changes

qgallouedec merged commit 52201d3 into main Dec 3, 2024
13 of 14 checks passed

qgallouedec deleted the fix-rloo branch December 3, 2024 20:31

dawidm mentioned this pull request Dec 23, 2024

RLOO trainer epochs/steps/episodes calculations seems not to be working properly #2515

Open

9 tasks

qgallouedec mentioned this pull request Jan 7, 2025

RLOO trainer: fix calculations of steps, episodes and epochs #2516

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧮 Fix `max_steps` calculation in `RLOOTrainer` #2433

🧮 Fix `max_steps` calculation in `RLOOTrainer` #2433

qgallouedec commented Dec 3, 2024

HuggingFaceDocBuilderDev commented Dec 3, 2024

kashif commented Dec 3, 2024

qgallouedec commented Dec 3, 2024

🧮 Fix max_steps calculation in RLOOTrainer #2433

🧮 Fix max_steps calculation in RLOOTrainer #2433

Conversation

qgallouedec commented Dec 3, 2024

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Dec 3, 2024

kashif commented Dec 3, 2024

qgallouedec commented Dec 3, 2024

🧮 Fix `max_steps` calculation in `RLOOTrainer` #2433

🧮 Fix `max_steps` calculation in `RLOOTrainer` #2433