-
-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a chat_template
prompt strategy for DPO
#1725
Conversation
This mimics the sft chat_template strategy such that users can: * Specify the messages field * Specify the per message role and content fields * speicfy the chosen and rejected fields * Let the tokenizer construct the raw prompt * Ensure the chosen and rejected fields don't have any prefix tokens
@@ -62,7 +62,7 @@ def process_tokens_for_rl_debug(tokens, color, tokenizer, text_only): | |||
"""Helper function to process and color tokens.""" | |||
colored_tokens = [ | |||
color_token_for_rl_debug(tokenizer.decode(token), token, color, text_only) | |||
for token in tokenizer.encode(tokens) | |||
for token in tokenizer.encode(tokens, add_special_tokens=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I added this since by default I saw that this step was including the bos token all the time. Since that's already included it seemed reasonable to not add it in a second time.
@fozziethebeat but for DPO training, since trl handles the tokenization, do we need this piece? |
Was this in reference to the change in the debugging output? If so, it's not required but I think anyone manually inspecting tokenization output (like i did) would be very surprised to see the bos token duplicated in numerous scenarios. So it's more to give confidence that we constructed the strings correctly. |
Any other changes to add before updating the branch and approving for merging? |
Description
Replicates the
chat_template
support from SFT datasets but for DPO training. Users can now specify a dataset with a list of conversation messages along with rejected and chosen columns having a single conversation message. Further, all fields can be customized.Motivation and Context
This change provides a more configurable set of datasets for DPO training.
Fixes #1708
How has this been tested?
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
@fozziethebeat