Standardise API for `WinRateCallback` and `LogCompletionsCallback` #2061

lewtun · 2024-09-12T15:57:09Z

What does this PR do?

This PR unifies the API for the WinRate and LogCompletion callbacks (previously, the latter required the prompts to be passed manually, sampling was tied to the training config etc). The new way to use them is as follows:

judge = PairRMJudge()
generation_config = GenerationConfig(max_new_tokens=256)
win_rate_callback = WinRateCallback(
    judge=judge, trainer=trainer, generation_config=generation_config, num_prompts=32
)
log_callback = LogCompletionsCallback(trainer=trainer, generation_config=generation_config, num_prompts=8)
trainer.add_callback(win_rate_callback)
trainer.add_callback(log_callback)

Doing so allowed me to also fix a bug we had in LogCompletionCallback where ZeRO-3 would hang during generation (the trick is to access and unwrap the model from the trainer with unload_model_for_generation())

Scripts tested

DPO (DDP & Z3)
GKD
XPO
Online DPO

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-09-12T16:02:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/trainer/callbacks.py

lewtun · 2024-09-13T10:02:26Z

examples/scripts/gkd.py

@@ -17,7 +17,7 @@
 python examples/scripts/gkd.py \
    --model_name_or_path Qwen/Qwen2-0.5B-Instruct \
    --teacher_model_name_or_path Qwen/Qwen2-1.5B-Instruct \
-    --dataset_name andito/chatbot_arena_completions \
+    --dataset_name trl-lib/chatbot_arena_conversations \


I switched to this dataset since the original doesn't have an eval split

trl/trainer/callbacks.py

examples/scripts/gkd.py

trl/trainer/callbacks.py

lewtun · 2024-09-13T14:49:41Z

trl/trainer/__init__.py

@@ -62,7 +62,7 @@
    "ddpo_config": ["DDPOConfig"],
    "gkd_trainer": ["GKDTrainer"],
    "gkd_config": ["GKDConfig"],
-    "callbacks": ["RichProgressCallback", "SyncRefModelCallback", "WinRateCallback"],
+    "callbacks": ["RichProgressCallback", "SyncRefModelCallback", "WinRateCallback", "LogCompletionsCallback"],


I unified the import path for the callbacks too

qgallouedec · 2024-09-13T15:35:12Z

LGTM, it looks way better now!

lewtun added 5 commits September 11, 2024 20:19

Use wrapped model

8cfbdb4

Make WinRateCallback work

7d53c15

Make LogCompletions work

04c6143

Make LogCompletions work

767e574

Fix scripts

db3cdf2

Fix path

faa3b16

qgallouedec reviewed Sep 12, 2024

View reviewed changes

trl/trainer/callbacks.py Show resolved Hide resolved

qgallouedec reviewed Sep 12, 2024

View reviewed changes

trl/trainer/callbacks.py Outdated Show resolved Hide resolved

lewtun added 2 commits September 13, 2024 09:57

Refactor

eff4794

Merge branch 'main' into fix-completions-cbk

353f46d

lewtun commented Sep 13, 2024

View reviewed changes

trl/trainer/callbacks.py Outdated Show resolved Hide resolved

lewtun changed the title ~~[WIP] Standardise API for WinRateCallback and LogCompletionsCallback~~ Standardise API for WinRateCallback and LogCompletionsCallback Sep 13, 2024

lewtun marked this pull request as ready for review September 13, 2024 10:07

lewtun requested a review from edbeeching September 13, 2024 10:07

lewtun added 4 commits September 13, 2024 13:21

Remove padding

238727f

Merge branch 'main' into fix-completions-cbk

a90da93

Refactor

78bae95

Fix docs

41ad81d

qgallouedec reviewed Sep 13, 2024

View reviewed changes

examples/scripts/gkd.py Outdated Show resolved Hide resolved

trl/trainer/callbacks.py Outdated Show resolved Hide resolved

lewtun added 4 commits September 13, 2024 14:07

Fix scripts

1102660

Fix TLDR template

d69e034

Use explicit args

ea8de95

Fix callback import

0e1dda6

lewtun commented Sep 13, 2024

View reviewed changes

Add docstring

a79a6b9

qgallouedec approved these changes Sep 13, 2024

View reviewed changes

lewtun merged commit 88bede6 into main Sep 13, 2024
10 checks passed

lewtun deleted the fix-completions-cbk branch September 13, 2024 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardise API for `WinRateCallback` and `LogCompletionsCallback` #2061

Standardise API for `WinRateCallback` and `LogCompletionsCallback` #2061

lewtun commented Sep 12, 2024 •

edited by qgallouedec

Loading

HuggingFaceDocBuilderDev commented Sep 12, 2024

lewtun Sep 13, 2024

lewtun Sep 13, 2024

qgallouedec commented Sep 13, 2024

Standardise API for WinRateCallback and LogCompletionsCallback #2061

Standardise API for WinRateCallback and LogCompletionsCallback #2061

Conversation

lewtun commented Sep 12, 2024 • edited by qgallouedec Loading

What does this PR do?

Scripts tested

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Sep 12, 2024

lewtun Sep 13, 2024

Choose a reason for hiding this comment

lewtun Sep 13, 2024

Choose a reason for hiding this comment

qgallouedec commented Sep 13, 2024

Standardise API for `WinRateCallback` and `LogCompletionsCallback` #2061

Standardise API for `WinRateCallback` and `LogCompletionsCallback` #2061

lewtun commented Sep 12, 2024 •

edited by qgallouedec

Loading