-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preference dataset docs #1636
Preference dataset docs #1636
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1636
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 02ca414 with merge base 9a863c8 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
The ground-truth in preference datasets is usually the outcome of a binary comparison between two completions for the same prompt, | ||
and where a human annotator has indicated that one completion is more preferable than the other, according to some pre-set criterion. | ||
These prompt-completion pairs could be instruct style (single-turn, optionally with a single prompt), chat style (multi-turn), or | ||
some other set of interactions between a user and model (e.g. free-form text completion). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this true, or do we only support preference chat? I guess as long as you make the transform it should work for all three, but your example below implies only chat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we support an optional system prompt it should work for all three right? That was the whole point of the preference dataset refactor to support arbitrary interactions
Context
What is the purpose of this PR? Is it to
Please link to any issues this PR addresses.
#1529
Details on expected preference dataset format, where you can use it, and how to use custom preference datasets.
I haven't covered using different preference message transforms - maybe that can go in the message docs?