[NVIDIA] Extend the custom fp8 accumulate dtype in non-jit scenarios #3827

kaixih · 2024-04-04T17:48:46Z

This PR extends the custom fp8 dtype in the non-jit scenarios. The custom fp8 dtype is for the grad accumulation where we can do a max op rather than the default add op to accumulate the grads. This is designed for the fp8 parameters whose grads are their new values and we only want to pick the max among the grads of the sub-tensors.

Previously, this custom dtype has to be in the jit scope and this PR relaxes that restriction.

Note, this PR depends on jax-ml/jax#20266

cc. @nluehr @mingxu1067

kaixih · 2024-04-08T17:57:34Z

This JAX PR has been merged and so I rebased my change to import the newly-introduced earray. @mattjj Can you take a look?

IvyZX · 2024-04-22T19:02:12Z

Hi thanks for adding this change.
Looks like JAX hasn't released earray, so you would need to add a check when importing it, if you want this PR to be merged in early. You might also need to run on JAX nightly to use the JAX-side change.
Other than that this PR looks good to me.

kaixih · 2024-04-23T21:10:21Z

@IvyZX Thanks for the review. I made one more change to make sure it compatible with the current JAX version that has no earray. PTAL.

kaixih force-pushed the fp8_meta_custom_dtype_non_jit branch 2 times, most recently from 4b8accf to f2070a1 Compare April 8, 2024 17:53

kaixih force-pushed the fp8_meta_custom_dtype_non_jit branch from f2070a1 to 0b59501 Compare April 8, 2024 18:07

Update the type rule to support non-jit use cases

fd9c467

kaixih force-pushed the fp8_meta_custom_dtype_non_jit branch from 0b59501 to fd9c467 Compare April 8, 2024 18:08

mingxu1067 mentioned this pull request Apr 12, 2024

[JAX] Support FP8 training for Pipeline Parallelism when Micro-batch > 1 on Paxml. NVIDIA/TransformerEngine#774

Merged

IvyZX self-assigned this Apr 22, 2024

Work with older jax versions

5dfc0db

IvyZX approved these changes Apr 23, 2024

View reviewed changes

IvyZX added the pull ready label Apr 23, 2024

copybara-service bot merged commit 7818932 into google:main Apr 25, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Extend the custom fp8 accumulate dtype in non-jit scenarios #3827

[NVIDIA] Extend the custom fp8 accumulate dtype in non-jit scenarios #3827

kaixih commented Apr 4, 2024

kaixih commented Apr 8, 2024

IvyZX commented Apr 22, 2024 •

edited

Loading

kaixih commented Apr 23, 2024

[NVIDIA] Extend the custom fp8 accumulate dtype in non-jit scenarios #3827

[NVIDIA] Extend the custom fp8 accumulate dtype in non-jit scenarios #3827

Conversation

kaixih commented Apr 4, 2024

kaixih commented Apr 8, 2024

IvyZX commented Apr 22, 2024 • edited Loading

kaixih commented Apr 23, 2024

IvyZX commented Apr 22, 2024 •

edited

Loading