Clean up DPO example #2043

lewtun · 2024-09-09T14:16:10Z

What does this PR do?

This PR standardises the dpo.py script to use the ultrafeedback_binarized dataset instead of the huge Anthropic one. I also tweaked the hparams so that they "just work" and use a more realistic SFT model.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-09-09T14:21:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lewtun · 2024-09-09T14:24:51Z

trl/commands/cli_utils.py

@@ -111,7 +111,6 @@ class DPOScriptArguments:
    dataset_name: str = field(default=None, metadata={"help": "the dataset name"})
    dataset_train_split: str = field(default="train", metadata={"help": "The dataset split to use for training"})
    dataset_test_split: str = field(default="test", metadata={"help": "The dataset split to use for evaluation"})
-    sanity_check: bool = field(default=False, metadata={"help": "only train on 1000 samples"})


This type of debugging arg shouldn't live in the lib IMO

Let's remove them all

Good idea! Done in ddf30cb

lewtun · 2024-09-11T11:31:33Z

There seems to be some rich related issues with the tests now that I removed the sanity_check arg. I'm taking a look

lewtun · 2024-09-11T12:05:54Z

Very strange, running with the rich callback gives an issue where the progress_bar is set to None during evaluation even though it is correctly initialised at the start of training:

Traceback (most recent call last):
  File "/fsx/lewis/git/hf/trl/trl/commands/scripts/dpo.py", line 176, in <module>
    metrics = trainer.evaluate()
  File "/fsx/lewis/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 3666, in evaluate
    output = eval_loop(
  File "/fsx/lewis/git/hf/trl/trl/trainer/dpo_trainer.py", line 1651, in evaluation_loop
    initial_output = super().evaluation_loop(
  File "/fsx/lewis/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 3888, in evaluation_loop
    self.control = self.callback_handler.on_prediction_step(args, self.state, self.control)
  File "/fsx/lewis/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer_callback.py", line 503, in on_prediction_step
    return self.call_event("on_prediction_step", args, state, control)
  File "/fsx/lewis/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer_callback.py", line 507, in call_event
    result = getattr(callback, event)(
  File "/fsx/lewis/git/hf/trl/trl/trainer/callbacks.py", line 124, in on_prediction_step
    self.prediction_task_id = self.prediction_bar.add_task(
AttributeError: 'NoneType' object has no attribute 'add_task'
Traceback (most recent call last):
  File "/fsx/lewis/git/hf/trl/trl/commands/scripts/dpo.py", line 176, in <module>
    metrics = trainer.evaluate()
  File "/fsx/lewis/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 3666, in evaluate
    output = eval_loop(
  File "/fsx/lewis/git/hf/trl/trl/trainer/dpo_trainer.py", line 1651, in evaluation_loop
    initial_output = super().evaluation_loop(
  File "/fsx/lewis/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer.py", line 3888, in evaluation_loop
    self.control = self.callback_handler.on_prediction_step(args, self.state, self.control)
  File "/fsx/lewis/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer_callback.py", line 503, in on_prediction_step
    return self.call_event("on_prediction_step", args, state, control)
  File "/fsx/lewis/miniconda3/envs/trl/lib/python3.10/site-packages/transformers/trainer_callback.py", line 507, in call_event
    result = getattr(callback, event)(
  File "/fsx/lewis/git/hf/trl/trl/trainer/callbacks.py", line 124, in on_prediction_step
    self.prediction_task_id = self.prediction_bar.add_task(
AttributeError: 'NoneType' object has no attribute 'add_task'

Command to repro:

TRL_USE_RICH=True CUDA_VISIBLE_DEVICES="" trl dpo --max_steps 1 --output_dir tmp-dpo --model_name_or_path trl-internal-testing/tiny-random-LlamaForCausalLM --dataset_name trl-lib/ultrafeedback_binarized --learning_rate 1e-4 --lr_scheduler_type cosine --dataset_num_proc 48

lewtun · 2024-09-11T15:19:59Z

examples/scripts/xpo.py

@@ -105,10 +95,6 @@ def prepare_dataset(row):
    with PartialState().local_main_process_first():
        dataset = dataset.map(prepare_dataset, num_proc=training_args.dataset_num_proc)

-    if args.max_samples is not None:


FYI @kashif @qgallouedec @edbeeching we should not add this logic into the example scripts IMO - it's best solved by adding support for something like the dataset mixer we have in the handbook or H4 repo

ah yes my bad!

Clean up DPO example

d5f2cb2

lewtun commented Sep 9, 2024

View reviewed changes

lewtun added 5 commits September 9, 2024 14:35

Fix bs

7ed8db8

Remove rentrant

3aba854

Fix tests

7986fbf

Merge branch 'main' into clean-up-dpo

d1d2cfc

Nuke sanity checks

ddf30cb

Switch dataset

d1fffe8

lewtun mentioned this pull request Sep 11, 2024

Remove RichProgressCallback from examples #2053

Merged

5 tasks

lewtun added 2 commits September 11, 2024 16:59

Merge remote-tracking branch 'origin/main' into clean-up-dpo

29087d8

Remove sanity check from XPO

aef1726

lewtun commented Sep 11, 2024

View reviewed changes

kashif approved these changes Sep 11, 2024

View reviewed changes

qgallouedec mentioned this pull request Sep 11, 2024

Remove debug and sanity_check args #2055

Merged

lewtun merged commit 9a6061f into main Sep 11, 2024
10 checks passed

lewtun deleted the clean-up-dpo branch September 11, 2024 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up DPO example #2043

Clean up DPO example #2043

lewtun commented Sep 9, 2024

HuggingFaceDocBuilderDev commented Sep 9, 2024

lewtun Sep 9, 2024

qgallouedec Sep 10, 2024

lewtun Sep 11, 2024

lewtun commented Sep 11, 2024

lewtun commented Sep 11, 2024

lewtun Sep 11, 2024

kashif Sep 11, 2024

Clean up DPO example #2043

Clean up DPO example #2043

Conversation

lewtun commented Sep 9, 2024

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Sep 9, 2024

lewtun Sep 9, 2024

Choose a reason for hiding this comment

qgallouedec Sep 10, 2024

Choose a reason for hiding this comment

lewtun Sep 11, 2024

Choose a reason for hiding this comment

lewtun commented Sep 11, 2024

lewtun commented Sep 11, 2024

lewtun Sep 11, 2024

Choose a reason for hiding this comment

kashif Sep 11, 2024

Choose a reason for hiding this comment