[NVIDIA] Use the fast accumulation for FP8 matmul #35

kaixih · 2023-11-02T22:38:42Z

As highlighted in this issue, this PR enables the use of fast accumulation for fprop FP8 matmul. This adjustment aligns with the changes implemented by Flax in this PR. It's important to note that this PR reuses the functions provided in the Flax change. Therefore, kindly consider merging this pull request after the completion of that one.

cc. @wenscarl @nluehr

kaixih · 2023-11-13T18:04:05Z

@zhangqiaorjc It seems the PR has stuck in the pull ready status for a while. Can you take a look?

kaixih · 2023-11-15T17:58:20Z

Gentle ping @zhangqiaorjc

Use the fast accumulation for FP8 matmul

2cefb21

zhangqiaorjc self-assigned this Nov 8, 2023

zhangqiaorjc added the pull ready label Nov 9, 2023

copybara-service bot merged commit e7c8561 into google:main Nov 15, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Use the fast accumulation for FP8 matmul #35

[NVIDIA] Use the fast accumulation for FP8 matmul #35

kaixih commented Nov 2, 2023

kaixih commented Nov 13, 2023

kaixih commented Nov 15, 2023

[NVIDIA] Use the fast accumulation for FP8 matmul #35

[NVIDIA] Use the fast accumulation for FP8 matmul #35

Conversation

kaixih commented Nov 2, 2023

kaixih commented Nov 13, 2023

kaixih commented Nov 15, 2023