SFTTrainer support #682

michaelbenayoun · 2024-08-23T12:53:01Z

What does this PR do?

This PR adds two classes:

NeuronSFTConfig
NeuronSFTTrainer

Both of these classes achieve the same goal as their trl counterpart.

HuggingFaceDocBuilderDev · 2024-08-29T14:21:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

JingyaHuang

Thanks for the PR, it looks good! Just don't we need to add trl to the setup.py, add NeuronSFTTrainer to the API doc, and perhaps if possible have minimal test?

optimum/neuron/trainers.py

michaelbenayoun · 2024-09-02T14:02:44Z

I did not add trl in the setup, as for peft, because it is not really required. It's is only required for a subset of the features.

dacorvo

As @JingyaHuang said, it would ne nice to have some unit tests before integrating this in the demo, to speed up integration by identifying issues early on.

dacorvo · 2024-09-02T14:46:28Z

optimum/neuron/trainers.py

+            args = NeuronSFTConfig(output_dir=output_dir)
+        elif args is not None and args.__class__.__name__ == "NeuronTrainingArguments":
+            args_as_dict = args.to_dict()
+            # Manually copy token values as TrainingArguments.to_dict() redacts them


This comes from the original trl, but I have no idea what this means ...

Basically the SFTConfig replaces the TrainingArguments. You can still provide training args and the SFTTrainer converts them to an SFTConfig.

dacorvo · 2024-09-02T14:48:08Z

optimum/neuron/trainers.py

@@ -1465,3 +1503,345 @@ class Seq2SeqNeuronTrainer(AugmentTrainerForNeuronMixin, Seq2SeqTrainer):
    """
    Seq2SeqTrainer that is suited for performing training on AWS Tranium instances.
    """
+
+
+class NeuronSFTTrainer(AugmentTrainerForNeuronMixin, SFTTrainer):


Maybe add a comment here indicating how this differs from the original (ie what are the neuron specifics).

michaelbenayoun · 2024-09-04T13:16:29Z

I have added tests. They do not check anything but run a small training job both with packed and unpacked datasets. If the training job succeeds, the test pass, otherwise it fails.

dacorvo · 2024-09-04T14:41:36Z

tests/test_trainers.py

+        output_dir = Path(tmpdir)
+
+        dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
+        # dataset = dataset.select(range(1000))


nit: remove commented line

dacorvo · 2024-09-04T14:42:34Z

tests/test_trainers.py

+            args=sft_config,
+        )
+
+        trainer.train()


Can't we verify that the loss goes down or something ?

It's a tiny random model. The SFTTrainer does not anything related to the loss anyways. It's just Trainer with dataset preparation abilities.

optimum/neuron/trainers.py

dacorvo · 2024-09-04T14:45:22Z

The new tests are failing:

FAILED tests/test_trainers.py::TestNeuronSFTTrainer::test_without_packing[dp=2] - TypeError: NeuronSFTConfig.__init__() got an unexpected keyword argument 'max_seq_length'
FAILED tests/test_trainers.py::TestNeuronSFTTrainer::test_with_packing[dp=2] - TypeError: NeuronSFTConfig.__init__() got an unexpected keyword argument 'max_seq_length'
FAILED tests/test_trainers.py::TestNeuronSFTTrainer::test_without_packing[tp=2] - TypeError: NeuronSFTConfig.__init__() got an unexpected keyword argument 'max_seq_length'
FAILED tests/test_trainers.py::TestNeuronSFTTrainer::test_with_packing[tp=2] - TypeError: NeuronSFTConfig.__init__() got an unexpected keyword argument 'max_seq_length'

dacorvo

Thank you for this pull-request !

michaelbenayoun added 5 commits August 22, 2024 11:57

Add SFTTrainer __init__

0b88043

Add SFTTrainer __init__

4fa4020

Add SFTTrainer train

1e8473b

Fix when packing=False

758d7d8

Add import

d2672cc

michaelbenayoun requested review from dacorvo and JingyaHuang August 29, 2024 13:41

michaelbenayoun marked this pull request as ready for review August 29, 2024 13:41

Add missing files

74e47a9

JingyaHuang reviewed Aug 30, 2024

View reviewed changes

optimum/neuron/trainers.py Outdated Show resolved Hide resolved

optimum/neuron/trainers.py Outdated Show resolved Hide resolved

michaelbenayoun added 2 commits September 2, 2024 15:55

Remove deprecated arguments

3efd548

Remove deprecated arguments

58e3a44

dacorvo reviewed Sep 2, 2024

View reviewed changes

michaelbenayoun added 2 commits September 4, 2024 15:10

Add test

1122f68

Add docstring

8691b9e

dacorvo reviewed Sep 4, 2024

View reviewed changes

michaelbenayoun added 4 commits September 4, 2024 17:21

Add trl

1c1c26b

Merge branch 'main' into sft_trainer

25ab3b9

Styling

72b968c

Apply suggestions

9fc38e9

dacorvo approved these changes Sep 5, 2024

View reviewed changes

michaelbenayoun merged commit 281d9bb into main Sep 5, 2024
7 of 11 checks passed

michaelbenayoun deleted the sft_trainer branch September 5, 2024 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SFTTrainer support #682

SFTTrainer support #682

michaelbenayoun commented Aug 23, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 29, 2024

JingyaHuang left a comment

michaelbenayoun commented Sep 2, 2024

dacorvo left a comment

dacorvo Sep 2, 2024

michaelbenayoun Sep 4, 2024

dacorvo Sep 2, 2024

michaelbenayoun Sep 4, 2024

michaelbenayoun commented Sep 4, 2024

dacorvo Sep 4, 2024

dacorvo Sep 4, 2024

michaelbenayoun Sep 4, 2024

dacorvo commented Sep 4, 2024

dacorvo left a comment

SFTTrainer support #682

SFTTrainer support #682

Conversation

michaelbenayoun commented Aug 23, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 29, 2024

JingyaHuang left a comment

Choose a reason for hiding this comment

michaelbenayoun commented Sep 2, 2024

dacorvo left a comment

Choose a reason for hiding this comment

dacorvo Sep 2, 2024

Choose a reason for hiding this comment

michaelbenayoun Sep 4, 2024

Choose a reason for hiding this comment

dacorvo Sep 2, 2024

Choose a reason for hiding this comment

michaelbenayoun Sep 4, 2024

Choose a reason for hiding this comment

michaelbenayoun commented Sep 4, 2024

dacorvo Sep 4, 2024

Choose a reason for hiding this comment

dacorvo Sep 4, 2024

Choose a reason for hiding this comment

michaelbenayoun Sep 4, 2024

Choose a reason for hiding this comment

dacorvo commented Sep 4, 2024

dacorvo left a comment

Choose a reason for hiding this comment

michaelbenayoun commented Aug 23, 2024 •

edited

Loading