-
-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example YAML file for training Mistral using DPO #2029
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the PR. I listed a few small points in this yaml that stood out
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Adding qlora and removing role-related data (unecessary)
I think I addressed all your concerns @NanoCode012. Let me know if there is anything else :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two more tiny nitpicks, and it looks good to go!
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Cool, I commited your comments @NanoCode012 |
Thank you very much for the PR! |
You're welcome! Glad to help. |
* Add example YAML file for training Mistral using DPO * chore: lint * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update mistral-dpo.yml Adding qlora and removing role-related data (unecessary) * Rename mistral-dpo.yml to mistral-dpo-qlora.yml * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Description
I provided a YAML file that can be used as an example to perform DPO on a mistral model, using open source datasets and open source models.
Motivation and Context
This is required as the chat template used originally in a Mistral 7B model does not allow for us to regularly perform DPO on the model. The original chat template enforces iterating between user and assistant. However in the chosen and rejected columns of DPO, we only have an assistant turn. For reference, the original chat template can be found in this file. I gave an example where I use the chatml template instead and introduced the respective special tokens, which allows for training.
How has this been tested?
I ran the yaml file locally and trained the model. I merged the model and then tested if the model produced outputs and was successfully using the new special tokens. The whole testing environment is a regular conda environment with axolotl and its dependencies. My changes do not affect other areas of the code since it is a yaml file.
Screenshots (if appropriate)
Types of changes
None
Social Handles (Optional)