FIX: TRL trainer preprocessing step was running in one process #1583

ali-mosavian · 2024-05-02T07:47:15Z

Description

We weren't passing dataset_num_proc to TRL training config, thus the initial data preprocessing steps in the TRL trainer was running in one process only.

Motivation and Context

Speeds up training start time by a lot depending on the number logical cores you have.

How has this been tested?

Tested it with the ORPO trainer in axolotl.

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

https://www.linkedin.com/in/ali-mosavian-7a27457/
https://github.com/ali-mosavian

winglian

thanks!

…ainer args and directly to the trainer when DPO

ali-mosavian · 2024-05-02T15:51:18Z

I made a misstake, i sent it as trainer args in all cases, but it only applies CPO, KTO and ORPO trainer args. For the DPO it needs to be sent directly to the trainer. My latest commit in the PR fixes that.

…in another way

* FIX: TRL trainer preprocessing step was running in one process * FIX: Changed so that dataset_num_proc is sent to CPO, KTO and ORPO trainer args and directly to the trainer when DPO * FIX: Changed back to only support ORPO for now, since KTO is handled in another way --------- Co-authored-by: Ali Mosavian <ali.mosavian@kry.se>

FIX: TRL trainer preprocessing step was running in one process

1f5f563

winglian approved these changes May 2, 2024

View reviewed changes

FIX: Changed so that dataset_num_proc is sent to CPO, KTO and ORPO tr…

7ecd94d

…ainer args and directly to the trainer when DPO

FIX: Changed back to only support ORPO for now, since KTO is handled …

b96c005

…in another way

winglian merged commit b9bb169 into axolotl-ai-cloud:main May 3, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: TRL trainer preprocessing step was running in one process #1583

FIX: TRL trainer preprocessing step was running in one process #1583

ali-mosavian commented May 2, 2024

winglian left a comment

ali-mosavian commented May 2, 2024

FIX: TRL trainer preprocessing step was running in one process #1583

FIX: TRL trainer preprocessing step was running in one process #1583

Conversation

ali-mosavian commented May 2, 2024

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

winglian left a comment

Choose a reason for hiding this comment

ali-mosavian commented May 2, 2024