Add a prototype of MX format training and inference #264

vkuzo · 2024-05-23T18:00:45Z

Summary:

The MX numerical formats are new low precision formats with recent acceptance into the OCP spec:
https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

This PR adds a reference native PyTorch implementation of training and inference primitives for using MX accelerated matrix multiplications. Currently, we use a reference layout (scale and raw data stored separately) and an emulated matrix multiplication.

Test Plan:

// lint
lintrunner --configs .lintrunner.toml -a
// tests
pytest -s test/prototype/mx_formats/*
// benchmarks
python torchao/prototype/mx_formats/benchmarks/bench_qdq.py

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-05-23T18:00:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/264

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4425d0d with merge base 5b04ff0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/prototype/mx_formats/test_mx_linear.py

vkuzo · 2024-05-23T19:19:05Z

need to add license and fix CI

msaroufim

first round of feedback on docs and ci related stuff - will do another pass

test/prototype/mx_formats/test_mx_linear.py

test/prototype/mx_formats/test_custom_cast.py

torchao/prototype/mx_formats/fp_formats.py

torchao/prototype/mx_formats/README.md

msaroufim · 2024-05-23T21:58:50Z

torchao/prototype/mx_formats/README.md

+```python
+from torchao.prototype.mx_formats.mx_tensor import MXTensor
+from torchao.prototype.mx_formats.constants import DTYPE_FP6_E2M3, DTYPE_FP6_E3M2, DTYPE_FP4
+x = torch.randn(...)


put a functioning snippet that people can copy paste

msaroufim · 2024-05-23T21:59:59Z

torchao/prototype/mx_formats/README.md

+```python
+from torchao.prototype.mx_formats.mx_linear import swap_linear_with_mx_linear
+
+m = Model(...)


same comment on a functional snippet

torchao/prototype/mx_formats/README.md

torchao/prototype/mx_formats/benchmarks/bench_qdq.py

vkuzo · 2024-05-24T16:24:37Z

ok, CI is green, going to address the other comments now

msaroufim

Stamping for now, will need a day or more to read the mx spec and don't wanna block your PR until then

vkuzo · 2024-05-24T16:39:57Z

Stamping for now, will need a day or more to read the mx spec and don't wanna block your PR until then

I can wait for review, would rather only land once people are ok with the code.

msaroufim

Thanks! Really enjoyed reviewing this. some minor nits but we should be good to merge

msaroufim · 2024-05-28T01:36:04Z

README.md

@@ -99,6 +99,7 @@ To learn more try out our APIs, you can check out API examples in
 3. Support for lower precision [dtypes](./torchao/dtypes) such as
    - [nf4](https://github.com/pytorch/ao/blob/main/torchao/dtypes/nf4tensor.py) which was used to [implement QLoRA](https://github.com/pytorch/torchtune/blob/main/docs/source/tutorials/qlora_finetune.rst) without writing custom Triton or CUDA code
    - [uint4](https://github.com/pytorch/ao/blob/main/torchao/dtypes/uint4.py)
+    - [MX](https://github.com/pytorch/ao/blob/main/torchao/prototype/mx_formats) implementing the [OCP MX spec](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf), prototype as the hardware support is not available yet


Worth expanding to mention MX including fp8/6/4 and int8 - MX is still new terminology

torchao/prototype/mx_formats/constants.py

msaroufim · 2024-05-28T01:41:33Z

torchao/prototype/mx_formats/constants.py

+import torch
+
+# This is conceptually an enum of non-core dtypes
+# if someone has time to verify torch.compile compatibility, it could be made


is the comment intending to say that torch.compile is breaking on enum or that in the future torch.compile support can be checked AND indepedently this could be made into an enum.

Indeed I feel like an enum would make this significantly easier to read cause you could conceptually print every row in the spec

The weird thing is that some of these dtypes are in core (the float8 ones) and some aren't (float6/float4/mx's spec of int8, etc). I think it would be nice to have a clean structure unifying all of that, I just haven't had the time. Definitely open for someone (or future me) to improve this.

msaroufim · 2024-05-28T01:42:47Z

torchao/prototype/mx_formats/utils.py

+def compute_error(x, y):
+    Ps = torch.norm(x)  # noqa: TOR101
+    Pn = torch.norm(x - y)  # noqa: TOR101
+    return 20 * torch.log10(Ps / Pn)


There's already a util for this exactl function in the code, somewhere in gptq IIRC so can we put this in torchao/utils.py instead?

msaroufim · 2024-05-28T01:46:14Z

torchao/prototype/mx_formats/README.md

+
+### MXTensor
+
+This is casts between fp32/bf16 and MX formats implemented in native PyTorch.


Btw the spec didn't seem too prescriptive around what the source dtype should be

fp32 and bf16 is what we have today, we can make it clearer that other dtypes can be added in the future

msaroufim · 2024-05-28T02:02:10Z

torchao/prototype/mx_formats/mx_tensor.py

+
+def get_fp_scale(scale_e8m0):
+    s_offset = scale_e8m0.to(torch.int16) - E8M0_EXPONENT_BIAS
+    # TODO(later): it would be nice if there was a way to do the 2^x operation


maybe this is helpful https://pytorch.org/docs/stable/generated/torch.ldexp.html

makes sense! I will punt this to a future person, this shouldn't be that important for e2e performance.

msaroufim · 2024-05-28T02:03:33Z

torchao/prototype/mx_formats/mx_tensor.py

+        return g, None, None
+
+
+@torch._dynamo.allow_in_graph


There's a public API torch.compiler.allow_in_graph - also curious why this was needed

this is necessary for compile to fully support training, and this line is copy-pasta from float8_experimental, ideally while these two products are in different codebases I'm hoping for these kind of issues to get fixed in float8_experimental first and be copied here. Once we unify it will be easier.

msaroufim · 2024-05-28T02:11:13Z

torchao/prototype/mx_formats/custom_cast.py

+    return _f4_or_f6_unpacked_to_f32(x, DTYPE_FP6_E3M2)
+
+
+if has_triton():


was the inductor codegen note adequate? Wondering if we can eventually remove this

current codegen was slow, tracked in pytorch/pytorch#124002 .

msaroufim · 2024-05-28T02:13:58Z

torchao/prototype/mx_formats/fp_formats.py

+    print("\n")
+
+
+if __name__ == "__main__":


wdyy about renaming this file to have spec in the name? I quite like and we can recommend people to cross reference the text spec with your code in the main README

torchao/prototype/mx_formats/mx_ops.py

msaroufim

Thanks! Really enjoyed reviewing this. some minor nits but we should be good to merge

.

Summary: The MX numerical formats are new low precision formats with recent acceptance into the OCP spec: https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf This PR adds a reference native PyTorch implementation of training and inference primitives for using MX accelerated matrix multiplications. Currently, we use a reference layout (scale and raw data stored separately) and an emulated matrix multiplication. Test Plan: ``` // tests pytest -s test/prototype/mx_formats/* // benchmarks python torchao/prototype/mx_formats/benchmarks/bench_qdq.py ``` Reviewers: Subscribers: Tasks: Tags:

vkuzo · 2024-05-28T16:31:08Z

@msaroufim needs a review again since I think this repo is setup to re-require reviews after changes, all of the feedback has been either addressed or explained why not addressed right now.

vkuzo · 2024-05-28T16:35:20Z

and thank you for the review!

Summary: The MX numerical formats are new low precision formats with recent acceptance into the OCP spec: https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf This PR adds a reference native PyTorch implementation of training and inference primitives for using MX accelerated matrix multiplications. Currently, we use a reference layout (scale and raw data stored separately) and an emulated matrix multiplication. Test Plan: ``` // tests pytest -s test/prototype/mx_formats/* // benchmarks python torchao/prototype/mx_formats/benchmarks/bench_qdq.py ``` Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 23, 2024

vkuzo commented May 23, 2024

View reviewed changes

test/prototype/mx_formats/test_mx_linear.py Show resolved Hide resolved

msaroufim reviewed May 24, 2024

View reviewed changes

vkuzo force-pushed the 20240523_mx_formats_code_move branch 7 times, most recently from aef63f9 to 0c84b1a Compare May 24, 2024 15:56

msaroufim self-requested a review May 24, 2024 16:37

msaroufim previously approved these changes May 24, 2024

View reviewed changes

vkuzo requested a review from msaroufim May 24, 2024 16:40

vkuzo force-pushed the 20240523_mx_formats_code_move branch 2 times, most recently from 455f148 to 77541c5 Compare May 24, 2024 19:21

msaroufim reviewed May 28, 2024

View reviewed changes

msaroufim self-requested a review May 28, 2024 04:53

vkuzo force-pushed the 20240523_mx_formats_code_move branch from 77541c5 to ad704f0 Compare May 28, 2024 16:22

vkuzo force-pushed the 20240523_mx_formats_code_move branch from ad704f0 to 4425d0d Compare May 28, 2024 16:29

msaroufim approved these changes May 28, 2024

View reviewed changes

vkuzo merged commit a7483f2 into main May 28, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a prototype of MX format training and inference #264

Add a prototype of MX format training and inference #264

vkuzo commented May 23, 2024 •

edited

Loading

pytorch-bot bot commented May 23, 2024 •

edited

Loading

vkuzo commented May 23, 2024

msaroufim left a comment

msaroufim May 23, 2024

msaroufim May 23, 2024

vkuzo commented May 24, 2024

msaroufim left a comment

vkuzo commented May 24, 2024

msaroufim left a comment

msaroufim May 28, 2024

msaroufim May 28, 2024

vkuzo May 28, 2024

msaroufim May 28, 2024

msaroufim May 28, 2024

vkuzo May 28, 2024

msaroufim May 28, 2024

vkuzo May 28, 2024

msaroufim May 28, 2024

vkuzo May 28, 2024

msaroufim May 28, 2024

vkuzo May 28, 2024

msaroufim May 28, 2024

msaroufim left a comment

vkuzo commented May 28, 2024

vkuzo commented May 28, 2024


		### MXTensor

		This is casts between fp32/bf16 and MX formats implemented in native PyTorch.

		return _f4_or_f6_unpacked_to_f32(x, DTYPE_FP6_E3M2)


		if has_triton():

Add a prototype of MX format training and inference #264

Add a prototype of MX format training and inference #264

Conversation

vkuzo commented May 23, 2024 • edited Loading

pytorch-bot bot commented May 23, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/264

✅ No Failures

vkuzo commented May 23, 2024

msaroufim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkuzo commented May 24, 2024

msaroufim left a comment

Choose a reason for hiding this comment

vkuzo commented May 24, 2024

msaroufim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msaroufim left a comment

Choose a reason for hiding this comment

vkuzo commented May 28, 2024

vkuzo commented May 28, 2024

vkuzo commented May 23, 2024 •

edited

Loading

pytorch-bot bot commented May 23, 2024 •

edited

Loading