-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of quantizelinear for int4 #1706
base: develop
Are you sure you want to change the base?
Improve performance of quantizelinear for int4 #1706
Conversation
mlir/lib/Conversion/RocmlirCustomTosaToLinalg/RocmlirCustomTosaToLinalg.cpp
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1706 +/- ##
===========================================
- Coverage 78.88% 78.45% -0.44%
===========================================
Files 100 100
Lines 28346 28405 +59
Branches 4130 4146 +16
===========================================
- Hits 22361 22285 -76
- Misses 4368 4458 +90
- Partials 1617 1662 +45
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
mlir/lib/Conversion/RocmlirCustomTosaToLinalg/RocmlirCustomTosaToLinalg.cpp
Outdated
Show resolved
Hide resolved
@@ -2549,7 +2549,16 @@ struct GridwiseGemmAccelRewritePattern | |||
|
|||
// Obtain data types of inputs. | |||
auto elementTypeA = op.getA().getType().getElementType(); | |||
auto maybeElementTypeALoad = getGemmInputElementType(op.getA()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a unit-test that uses the getGemmInputElementType
to test out that logic along with gridwisegemmtoblockwise
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, do you know if where unit tests are written? I've only done lit/mlir tests (not sure how they are called).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant lit test. You can also write e2e test that verifies results against host runner.
1748a96
to
f87b60a
Compare
In this PR we improve the performance of quantizelinear for int4, these are the changes:
Pack scale and bias together in the same tensor (quantizelinear)There's a PR in migraphx to also change the layout of the scale+bias tensor: ROCm/AMDMIGraphX#3718
This is the migraphx program of the layout change (int32 packing scale and bias together):
@pfultz2 pointed out we can use slice operations instead of changing quantizelinear to use one param. This simplifies this PR a lot.
TODO:
closes: https://github.com/ROCm/rocMLIR-internal/issues/1665