Add example YAML file for training Mistral using DPO #2029

olivermolenschot · 2024-11-11T03:05:45Z

Description

I provided a YAML file that can be used as an example to perform DPO on a mistral model, using open source datasets and open source models.

Motivation and Context

This is required as the chat template used originally in a Mistral 7B model does not allow for us to regularly perform DPO on the model. The original chat template enforces iterating between user and assistant. However in the chosen and rejected columns of DPO, we only have an assistant turn. For reference, the original chat template can be found in this file. I gave an example where I use the chatml template instead and introduced the respective special tokens, which allows for training.

How has this been tested?

I ran the yaml file locally and trained the model. I merged the model and then tested if the model produced outputs and was successfully using the new special tokens. The whole testing environment is a regular conda environment with axolotl and its dependencies. My changes do not affect other areas of the code since it is a yaml file.

Screenshots (if appropriate)

Types of changes

None

Social Handles (Optional)

NanoCode012

Hey, thanks for the PR. I listed a few small points in this yaml that stood out

examples/mistral/mistral-dpo.yml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

Adding qlora and removing role-related data (unecessary)

olivermolenschot · 2024-11-12T05:21:31Z

I think I addressed all your concerns @NanoCode012. Let me know if there is anything else :)

NanoCode012

Just two more tiny nitpicks, and it looks good to go!

examples/mistral/mistral-dpo-qlora.yml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

olivermolenschot · 2024-11-12T06:35:33Z

Cool, I commited your comments @NanoCode012

NanoCode012 · 2024-11-12T06:48:46Z

Thank you very much for the PR!

olivermolenschot · 2024-11-12T06:54:38Z

You're welcome! Glad to help.

* Add example YAML file for training Mistral using DPO * chore: lint * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update mistral-dpo.yml Adding qlora and removing role-related data (unecessary) * Rename mistral-dpo.yml to mistral-dpo-qlora.yml * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

olivermolenschot and others added 2 commits November 10, 2024 18:50

Add example YAML file for training Mistral using DPO

22d6c6c

chore: lint

10d5e44

NanoCode012 requested changes Nov 11, 2024

View reviewed changes

olivermolenschot and others added 4 commits November 11, 2024 11:46

Apply suggestions from code review

3ff2933

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

Update mistral-dpo.yml

1d5e0c0

Adding qlora and removing role-related data (unecessary)

Rename mistral-dpo.yml to mistral-dpo-qlora.yml

f3f8da0

Merge branch 'main' into mistral-dpo

a685600

NanoCode012 reviewed Nov 12, 2024

View reviewed changes

examples/mistral/mistral-dpo-qlora.yml Outdated Show resolved Hide resolved

examples/mistral/mistral-dpo-qlora.yml Outdated Show resolved Hide resolved

olivermolenschot and others added 2 commits November 11, 2024 22:32

Apply suggestions from code review

ceb212c

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

Merge branch 'main' into mistral-dpo

af275a3

NanoCode012 approved these changes Nov 12, 2024

View reviewed changes

NanoCode012 added the ready to merge label Nov 12, 2024

winglian merged commit a4b1cc6 into axolotl-ai-cloud:main Nov 13, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example YAML file for training Mistral using DPO #2029

Add example YAML file for training Mistral using DPO #2029

olivermolenschot commented Nov 11, 2024

NanoCode012 left a comment

olivermolenschot commented Nov 12, 2024

NanoCode012 left a comment

olivermolenschot commented Nov 12, 2024

NanoCode012 commented Nov 12, 2024

olivermolenschot commented Nov 12, 2024

Add example YAML file for training Mistral using DPO #2029

Add example YAML file for training Mistral using DPO #2029

Conversation

olivermolenschot commented Nov 11, 2024

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

NanoCode012 left a comment

Choose a reason for hiding this comment

olivermolenschot commented Nov 12, 2024

NanoCode012 left a comment

Choose a reason for hiding this comment

olivermolenschot commented Nov 12, 2024

NanoCode012 commented Nov 12, 2024

olivermolenschot commented Nov 12, 2024