Mixed-precision training with both `torch_xla` or `torch.autocast` #523

michaelbenayoun · 2024-03-21T09:38:27Z

What does this PR do?

There are two ways to cast to bfloat16:

Use torch_xla casting system via the environment variables XLA_DOWNCAST_BF16 or XLA_USE_BF16.
Use the native torch.autocast feature.

The first approach was already supported, this PR adds support for the second approach.
It also fixes issues related to how we can set the NEURON_CC_FLAGS. If they are set too late (e.g after the process group initialization), they will be ignored by the compiler. This PR makes sure we set them at the right time.

HuggingFaceDocBuilderDev · 2024-03-21T09:43:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

michaelbenayoun · 2024-03-29T10:43:35Z

examples/language-modeling/run_clm.py

We restore that before merging.

dacorvo · 2024-04-03T11:52:56Z

optimum/neuron/accelerate/accelerator.py

+        autocast_context.__enter__()
+        yield
+        autocast_context.__exit__(*sys.exc_info())


Wouldn't it be simpler to just do:

Suggested change

autocast_context.__enter__()

yield

autocast_context.__exit__(*sys.exc_info())

with torch.autocast(dtype=torch.bfloat16, device_type="cuda", **autocast_kwargs):

yield

Or is the linter complaining ?

Also, why cuda ?

I took that part from the accelerate library. I guess it could work.

It is as it is suggested by the AWS Neuron documentation.

If it comes from the doc, then there must be a reason.

Yes

The device type is CUDA because we are using CUDA’s list of BF16 compatible operations as mentioned above.

dacorvo · 2024-04-03T11:54:53Z

optimum/neuron/accelerate/state.py

+                # It is important to set the environment variables before initializing the process group otherwise they will be ignored by the Neuron compiler.
+                set_common_neuron_cc_flags()
+                if os.environ.get("ACCELERATE_USE_AMP", "false") == "true":
+                    set_neuron_cc_flags_for_torch_amp()


Is there a place in the code where you restore the env ? If not maybe consider having a singleton class to do that: upon instantiation it stores the original cc flags, and on deletion it restores them. Then you wrap all your changes under context calls and on startup you get a ref to the singleton.
When all contexts have returned all refs to the singleton are released and the env is restored.
Maybe it is too involved but I just realized that whenever you call the training code, the cc flags will be completely unusable for inference.

So, actually the NEURON_CC_FLAGS need to be set before initializing the process group. Once it is done, it will never change for the Neuron compiler (during this runtime).

Currently the NeuronState is only used for training so I dont think it will be an issue. And the original environment will not be affected, only the environment for the current process.

I looked at the AWS documentation and they don't seem to care much either about restoring env variable.
Forget about my comment.

In any case, as explained, it does not really change anything, once the process group has been initialized, we cannot change the environment for the Neuron compiler. That is also the reason why I changed some of the ways we set the flags. Any flag that is model dependent cannot be set by optimum-neuron because by the time we have the model, we usually already have initialized the process group.

[WIP]

d1e718f

michaelbenayoun added 7 commits March 21, 2024 10:52

Restore mark step

006d74f

Merge branch 'main' into mixed_precision

704602a

[WIP] amp

1c91e5b

[WIP] amp

b32014a

[WIP] amp

ee801c4

Removed patched_finfo

4f59b9d

[WIP] amp

e767af7

michaelbenayoun changed the title ~~Small change for mixed-precision training~~ Mixed-precision training with both torch_xla or torch.autocast Mar 22, 2024

michaelbenayoun added 3 commits March 25, 2024 15:02

Update optimizer step

e9a7d88

[WIP] NEURON_CC_FLAGS

833835d

Merge branch 'main' into mixed_precision

911f24e

michaelbenayoun commented Mar 29, 2024

View reviewed changes

examples/language-modeling/run_clm.py

Copy link

Member Author

michaelbenayoun Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We restore that before merging.

michaelbenayoun added 8 commits March 29, 2024 14:41

[WIP] integration of autocast

3fef613

[WIP] integration of autocast

24e7764

Add missing file

23ec435

Take care of the features for neuron cc flags

cd239cc

Fix small issues

d67b646

fix

58807a4

Disable obsolete tests

9cfe5a2

Styling

3176a37

michaelbenayoun marked this pull request as ready for review April 2, 2024 10:29

michaelbenayoun requested review from dacorvo and JingyaHuang April 2, 2024 10:29

michaelbenayoun added 3 commits April 2, 2024 17:44

Tiny fix

2a38f40

Merge branch 'main' into mixed_precision

8cb556f

Fix issue

f84ee20

dacorvo reviewed Apr 3, 2024

View reviewed changes

dacorvo approved these changes Apr 3, 2024

View reviewed changes

michaelbenayoun merged commit 3005c77 into main Apr 3, 2024
10 of 11 checks passed

michaelbenayoun deleted the mixed_precision branch April 3, 2024 13:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed-precision training with both `torch_xla` or `torch.autocast` #523

Mixed-precision training with both `torch_xla` or `torch.autocast` #523

michaelbenayoun commented Mar 21, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 21, 2024

michaelbenayoun Mar 29, 2024

dacorvo Apr 3, 2024

dacorvo Apr 3, 2024

michaelbenayoun Apr 3, 2024

dacorvo Apr 3, 2024

michaelbenayoun Apr 3, 2024

dacorvo Apr 3, 2024

michaelbenayoun Apr 3, 2024

dacorvo Apr 3, 2024

michaelbenayoun Apr 3, 2024

Mixed-precision training with both torch_xla or torch.autocast #523

Mixed-precision training with both torch_xla or torch.autocast #523

Conversation

michaelbenayoun commented Mar 21, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mixed-precision training with both `torch_xla` or `torch.autocast` #523

Mixed-precision training with both `torch_xla` or `torch.autocast` #523

michaelbenayoun commented Mar 21, 2024 •

edited

Loading