-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added error check to RLOO, PPOv2, OnlineDPO that ref_policy
and policy
have different identities
#2057
Conversation
…y should have different identities.
As a quick reminder,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding this sanity check @RylanSchaeffer ! If I understand correctly, this scenario can occur for any trainer where there is a reference model - would you like to expand your check to the other trainers as well?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Useful check, thanks! |
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Failing test not related to this PR |
Thanks @RylanSchaeffer! |
As discussed in Issue 2046, we add an error check to make sure that the
ref_policy
andpolicy
have different identities.This is just a quick and dirty demonstration to show what I have in mind. Feedback is very welcome!