-
Notifications
You must be signed in to change notification settings - Fork 520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CK BF16 Gemm #2617
CK BF16 Gemm #2617
Conversation
This pull request was exported from Phabricator. Differential Revision: D57292145 |
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Summary: Implementation of BF16 Gemm using the latest features from CK. Performance is comparable with hipblas but often a little worse. Detailed benchmarking can be found [here](https://docs.google.com/spreadsheets/d/10b9mRM6xCi1Iv-mRGkPjk37DfYQEDx3zU-EmV9t7f0s/edit?usp=sharing). We see that for llama shapes, there likely isnt much benefit to this kernel. However, it may be useful for less common shapes that hipblas struggles with. We even see a few cases here where the ck kernel is slightly faster. Reviewed By: jianyuh Differential Revision: D57292145
This pull request was exported from Phabricator. Differential Revision: D57292145 |
Summary: Implementation of BF16 Gemm using the latest features from CK. Performance is comparable with hipblas but often a little worse. Detailed benchmarking can be found [here](https://docs.google.com/spreadsheets/d/10b9mRM6xCi1Iv-mRGkPjk37DfYQEDx3zU-EmV9t7f0s/edit?usp=sharing). We see that for llama shapes, there likely isnt much benefit to this kernel. However, it may be useful for less common shapes that hipblas struggles with. We even see a few cases here where the ck kernel is slightly faster. Reviewed By: jianyuh Differential Revision: D57292145
This pull request was exported from Phabricator. Differential Revision: D57292145 |
Summary: Implementation of BF16 Gemm using the latest features from CK. Performance is comparable with hipblas but often a little worse. Detailed benchmarking can be found [here](https://docs.google.com/spreadsheets/d/10b9mRM6xCi1Iv-mRGkPjk37DfYQEDx3zU-EmV9t7f0s/edit?usp=sharing). We see that for llama shapes, there likely isnt much benefit to this kernel. However, it may be useful for less common shapes that hipblas struggles with. We even see a few cases here where the ck kernel is slightly faster. Reviewed By: jianyuh Differential Revision: D57292145
This pull request was exported from Phabricator. Differential Revision: D57292145 |
This pull request has been merged in 7930859. |
Summary: Implementation of BF16 Gemm using the latest features from CK. Performance is comparable with hipblas but often a little worse. Detailed benchmarking can be found here. We see that for llama shapes, there likely isnt much benefit to this kernel. However, it may be useful for less common shapes that hipblas struggles with. We even see a few cases here where the ck kernel is slightly faster.
Reviewed By: jianyuh
Differential Revision: D57292145