Add general support for generation on TRN with NxD #370

aws-tianquaw · 2023-12-07T22:44:34Z

This pull request adds general support for generative LM generation on TRN. Currently, the generation can only be done separately from the training job with transformer_neuronx inference feature. The limitation is that it needs to be run as a separate program from the training job. With this support added, the users will be able to run model evaluation as part of the training job and choose to run it every n steps or epochs. Although we have implemented NeuronGenerationMixin class that delivers a similar feature, it has some drawbacks and functionality issues which I'll specify soon.

The generation will be done on both Neuron device and CPU in the following way:

Model forward pass will be executed on TRN
All other logics including padding, searching, and sampling will be handled by general device (CPU).

This enables us to support various of search/sampling methods provided by huggingface with minimal code changes. It has a couple of advantages comparing to the existing implementation of NeuronGenerationMixin class:

The new implementation only overrides a small portion of generate function and call super().generate() to reuse most of HuggingFace implementation. Therefore, it will be more stable and less likely to brake as HuggingFace updates transformers package.
The new implementation has general supports all HF search/sampling methods while the existing one only supports greedy search and beam search (which doesn't work with latest transformers)
The new implementation is much more light-weighted so it's much easier to maintain.

I've tested the functionality of text generation with GPT2 and Llama-2-7b models using this script. The output results on TRN exactly matches the CPU outputs.

Reference: #108

dacorvo

Thank you very much for this pull-request.
I am fully aligned with the fact that the existing NeuronGenerationMixin needs to be replaced by a more robust and easier to maintain implementation, and this goes definitely in the right direction.
An additional benefit is that this new class will be compatible with the logits processors, some of which are really important (like repetition penalty).
I have made a few comments on the implementation itself, mostly related to how things are organized (I would rather put everything at the same place, if possible inside the mixin).
My main request however is related to the tests: I am not comfortable with integrating new code without proper unit tests.
These tests will not only allow us to detect early bugs or configurations we don't support, but will also help us detecting regressions when bumping the transformers version.

optimum/neuron/generation/utils.py

optimum/neuron/utils/generation_utils.py

optimum/neuron/utils/training_utils.py

HuggingFaceDocBuilderDev · 2024-01-10T10:12:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dacorvo

Thanks you very much for this pull-request, and taking the time to fully address my comments.
Before merging, you need to:

apply styling (make style),
rebase on main (another pull-request has been merged to use the latest AWS Neuron SDK, sorry about that ...),
ping us to trigger the CI tests again.

dacorvo

The test_runner failures in the CI are related to the fact that you don't have access to the HF internal token, and you should not worry about them.
However, two of the tests that you added are failing.

tests/test_cache_utils.py ......ssssssssss                               [ 21%]
tests/test_generate.py sssss                                             [ 28%]
tests/test_runner.py EEEEEE                                              [ 36%]
tests/test_trainer_callback.py ssss                                      [ 41%]
tests/test_trainers.py sssssss                                           [ 50%]
tests/test_utils.py .                                                    [ 52%]
tests/cli/test_neuron_cache_cli.py ssssss                                [ 60%]
tests/generation/test_generate.py ............FF................         [100%]

aws-tianquaw · 2024-01-16T20:37:07Z

The test_runner failures in the CI are related to the fact that you don't have access to the HF internal token, and you should not worry about them. However, two of the tests that you added are failing.

tests/test_cache_utils.py ......ssssssssss                               [ 21%]
tests/test_generate.py sssss                                             [ 28%]
tests/test_runner.py EEEEEE                                              [ 36%]
tests/test_trainer_callback.py ssss                                      [ 41%]
tests/test_trainers.py sssssss                                           [ 50%]
tests/test_utils.py .                                                    [ 52%]
tests/cli/test_neuron_cache_cli.py ssssss                                [ 60%]
tests/generation/test_generate.py ............FF................         [100%]

The tests are now passing. Could you please check if there is any other merge blocker? If no, when can we target to merge this PR?

dacorvo

Thank you so much for your patience and dedication !

Add general support for generation on TRN with NxD

3168cac

dacorvo requested changes Dec 11, 2023

View reviewed changes

aws-tianquaw and others added 6 commits December 19, 2023 13:01

Merge branch 'huggingface:main' into main

c06e571

Fix bugs and refactor generation utils

0dfbe24

Add unit tests and minor fixes

f9f8225

Move unit tests from CPU to TRN

e870524

Fix max length when max_new_tokens is used

be37790

Merge branch 'huggingface:main' into main

f2bd8c5

dacorvo requested changes Jan 10, 2024

View reviewed changes

aws-tianquaw and others added 4 commits January 10, 2024 11:58

Merge branch 'huggingface:main' into main

f5e8842

Fix styles

73a6a22

Fix styles - imports

1d13361

Fix test neuron compile flags

cf94754

dacorvo requested changes Jan 11, 2024

View reviewed changes

Add beam sample tests and remove GPT-2 tests

47b2f82

dacorvo approved these changes Jan 17, 2024

View reviewed changes

dacorvo merged commit 8fd86c1 into huggingface:main Jan 17, 2024
2 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add general support for generation on TRN with NxD #370

Add general support for generation on TRN with NxD #370

aws-tianquaw commented Dec 7, 2023

dacorvo left a comment

HuggingFaceDocBuilderDev commented Jan 10, 2024

dacorvo left a comment

dacorvo left a comment

aws-tianquaw commented Jan 16, 2024

dacorvo left a comment

Add general support for generation on TRN with NxD #370

Add general support for generation on TRN with NxD #370

Conversation

aws-tianquaw commented Dec 7, 2023

dacorvo left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jan 10, 2024

dacorvo left a comment

Choose a reason for hiding this comment

dacorvo left a comment

Choose a reason for hiding this comment

aws-tianquaw commented Jan 16, 2024

dacorvo left a comment

Choose a reason for hiding this comment