Add ORPO Trainer + support HF metrics directly from chunked loss functions + fixes to avoid torch compile recompilations #429

shivam15s · 2024-12-06T00:48:15Z

Summary

This PR adds support for the following:

LigerORPOTrainer: a wrapper on top of HuggingFace ORPO Trainer to use LigerORPOLoss module.
We also provide an example for using LigerORPOTrainer in examples/alignment/run_orpo.py
Change FusedLinearPreference base class' forward function to return additional metrics to align our implementation with HF ORPO Trainer
Additional refactor to avoid torch compile recompilations -- accumulate_chunk function now calls accumulate_helper which is torch compiled solely and input_chunk/target_chunk/target dimension 1 (seq len) is explicitly marked as dynamic to avoid recompilations

ByronHsu

awesome work!!

shivam15s added 9 commits December 6, 2024 01:11

add simple training script for orpo

865085d

add metrics

f513f73

metric handling and make torch compile more robust

c82419d

test changes to accommodate metrics

3f1873c

checkstyle

ff4bb15

scratch

e5f8bfc

add accelerate config

22cfa47

orpo trainer add

8e4f190

checkstyle

3f378d1

shivam15s force-pushed the shisahni/orpo_integrate branch from 984c5b5 to 3f378d1 Compare December 6, 2024 01:35

shivam15s added 3 commits December 6, 2024 02:00

add trl to dev

a050cc5

empty commit

1f100dc

increase tol for qwen2 vl test

18a385f

ByronHsu approved these changes Dec 6, 2024

View reviewed changes

ByronHsu merged commit 6cb0018 into main Dec 6, 2024
3 checks passed

ByronHsu deleted the shisahni/orpo_integrate branch December 6, 2024 18:09

ccdv-ai mentioned this pull request Dec 6, 2024

Support ORPO/DPO Liger losses (and LigerORPOTrainer) axolotl-ai-cloud/axolotl#2141

Open

5 tasks