Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SFTTrainer support #682

Merged
merged 14 commits into from
Sep 5, 2024
Merged

SFTTrainer support #682

merged 14 commits into from
Sep 5, 2024

Conversation

michaelbenayoun
Copy link
Member

@michaelbenayoun michaelbenayoun commented Aug 23, 2024

What does this PR do?

This PR adds two classes:

  • NeuronSFTConfig
  • NeuronSFTTrainer

Both of these classes achieve the same goal as their trl counterpart.

@michaelbenayoun michaelbenayoun marked this pull request as ready for review August 29, 2024 13:41
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@JingyaHuang JingyaHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, it looks good! Just don't we need to add trl to the setup.py, add NeuronSFTTrainer to the API doc, and perhaps if possible have minimal test?

optimum/neuron/trainers.py Outdated Show resolved Hide resolved
optimum/neuron/trainers.py Outdated Show resolved Hide resolved
@michaelbenayoun
Copy link
Member Author

I did not add trl in the setup, as for peft, because it is not really required. It's is only required for a subset of the features.

Copy link
Collaborator

@dacorvo dacorvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @JingyaHuang said, it would ne nice to have some unit tests before integrating this in the demo, to speed up integration by identifying issues early on.

args = NeuronSFTConfig(output_dir=output_dir)
elif args is not None and args.__class__.__name__ == "NeuronTrainingArguments":
args_as_dict = args.to_dict()
# Manually copy token values as TrainingArguments.to_dict() redacts them
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from the original trl, but I have no idea what this means ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the SFTConfig replaces the TrainingArguments. You can still provide training args and the SFTTrainer converts them to an SFTConfig.

@@ -1465,3 +1503,345 @@ class Seq2SeqNeuronTrainer(AugmentTrainerForNeuronMixin, Seq2SeqTrainer):
"""
Seq2SeqTrainer that is suited for performing training on AWS Tranium instances.
"""


class NeuronSFTTrainer(AugmentTrainerForNeuronMixin, SFTTrainer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here indicating how this differs from the original (ie what are the neuron specifics).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@michaelbenayoun
Copy link
Member Author

I have added tests. They do not check anything but run a small training job both with packed and unpacked datasets. If the training job succeeds, the test pass, otherwise it fails.

output_dir = Path(tmpdir)

dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
# dataset = dataset.select(range(1000))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove commented line

args=sft_config,
)

trainer.train()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we verify that the loss goes down or something ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a tiny random model. The SFTTrainer does not anything related to the loss anyways. It's just Trainer with dataset preparation abilities.

optimum/neuron/trainers.py Outdated Show resolved Hide resolved
@dacorvo
Copy link
Collaborator

dacorvo commented Sep 4, 2024

The new tests are failing:

FAILED tests/test_trainers.py::TestNeuronSFTTrainer::test_without_packing[dp=2] - TypeError: NeuronSFTConfig.__init__() got an unexpected keyword argument 'max_seq_length'
FAILED tests/test_trainers.py::TestNeuronSFTTrainer::test_with_packing[dp=2] - TypeError: NeuronSFTConfig.__init__() got an unexpected keyword argument 'max_seq_length'
FAILED tests/test_trainers.py::TestNeuronSFTTrainer::test_without_packing[tp=2] - TypeError: NeuronSFTConfig.__init__() got an unexpected keyword argument 'max_seq_length'
FAILED tests/test_trainers.py::TestNeuronSFTTrainer::test_with_packing[tp=2] - TypeError: NeuronSFTConfig.__init__() got an unexpected keyword argument 'max_seq_length'

Copy link
Collaborator

@dacorvo dacorvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this pull-request !

@michaelbenayoun michaelbenayoun merged commit 281d9bb into main Sep 5, 2024
7 of 11 checks passed
@michaelbenayoun michaelbenayoun deleted the sft_trainer branch September 5, 2024 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants