-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up DPO example #2043
Clean up DPO example #2043
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -111,7 +111,6 @@ class DPOScriptArguments: | |||
dataset_name: str = field(default=None, metadata={"help": "the dataset name"}) | |||
dataset_train_split: str = field(default="train", metadata={"help": "The dataset split to use for training"}) | |||
dataset_test_split: str = field(default="test", metadata={"help": "The dataset split to use for evaluation"}) | |||
sanity_check: bool = field(default=False, metadata={"help": "only train on 1000 samples"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of debugging arg shouldn't live in the lib IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove them all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Done in ddf30cb
There seems to be some |
Very strange, running with the
Command to repro:
|
@@ -105,10 +95,6 @@ def prepare_dataset(row): | |||
with PartialState().local_main_process_first(): | |||
dataset = dataset.map(prepare_dataset, num_proc=training_args.dataset_num_proc) | |||
|
|||
if args.max_samples is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @kashif @qgallouedec @edbeeching we should not add this logic into the example scripts IMO - it's best solved by adding support for something like the dataset mixer we have in the handbook or H4 repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes my bad!
What does this PR do?
This PR standardises the
dpo.py
script to use theultrafeedback_binarized
dataset instead of the huge Anthropic one. I also tweaked the hparams so that they "just work" and use a more realistic SFT model.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.