-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[microNPU] Tweak a layout transform matrix #10763
Conversation
23685d9
to
33525e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @ekalda! Just some small suggestions
@@ -187,7 +187,7 @@ def conv2d_compute( | |||
[1, 0, 0, 0, 0, 0], | |||
[0, 1, 0, 0, 0, 0], | |||
[0, 0, 0, 1, 0, 0], | |||
[0, 0, 16, 0, 1, -16], | |||
[0, 0, 0, 0, 0, ofm_channels], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: Any reason for not making this change in binary_elementwise
and unary_elementwise
? Additionally it looks like the nhcwb16_to_nhwc
and nhwc_to_nhcwb16
matrices are the same across the operators, should we move it to a common place like util to reduce duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I didn't make that change since the old transformation matrix wasn't "harmful" in the case of these operators (it was only a problem for operators that had weights or some kind of a kernel), but I think it makes sense to unify the transform for all the NPU ops since then we can indeed easily define the layout transform matrices in one place and easily reuse them across all the TEs and also in the tests.
@@ -169,7 +169,7 @@ def pooling_compute( | |||
[1, 0, 0, 0, 0, 0], | |||
[0, 1, 0, 0, 0, 0], | |||
[0, 0, 0, 1, 0, 0], | |||
[0, 0, 16, 0, 1, -16], | |||
[0, 0, 0, 0, 0, int(ofm_channels)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is int(...)
necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, since the type of ofm_channels
is IntImm
, so without the cast it would end up causing an error in propagator.py
33525e6
to
b0d1ae6
Compare
Thanks for the reviews @lhutton1 and @NicolaLancellotti! I've addresed the comments... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ekalda ! (and other reviewers) .
This looks much cleaner and I've added some suggestions to improve the docs around the matrices
(so it is clear why those numbers looks like that)
from typing import Tuple, List | ||
|
||
|
||
def get_layout_transform_matrices(ofm_channels: int) -> Tuple[List[List[float]], List[List[float]]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs : Lets link to this page : https://developer.arm.com/documentation/102420/0200/Functional-description/Control-and-data-flow/Supported-memory-formats-for-feature-maps here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
[0, 1, 0, 0, 0], | ||
[0, 0, 0, 1 / 16, 0], | ||
[0, 0, 1, 0, 0], | ||
[0, 0, 0, 0, 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs : might worth putting a comment to indicate that b16 axis is always going to be of fixed size of 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
[1, 0, 0, 0, 0, 0], | ||
[0, 1, 0, 0, 0, 0], | ||
[0, 0, 0, 1, 0, 0], | ||
[0, 0, 0, 0, 0, ofm_channels], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs : lets state that the conversion of nhwc_to_nhcwb16 is lossy (because b16 axis is fixed to 16). Therefore, to recover the original "c" of "nhwc", we need to use the original channels in the transform matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
b0d1ae6
to
23ecc50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions @manupa-arm! I added documentation for the transform ops, does it make sense?
from typing import Tuple, List | ||
|
||
|
||
def get_layout_transform_matrices(ofm_channels: int) -> Tuple[List[List[float]], List[List[float]]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
[0, 1, 0, 0, 0], | ||
[0, 0, 0, 1 / 16, 0], | ||
[0, 0, 1, 0, 0], | ||
[0, 0, 0, 0, 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
[1, 0, 0, 0, 0, 0], | ||
[0, 1, 0, 0, 0, 0], | ||
[0, 0, 0, 1, 0, 0], | ||
[0, 0, 0, 0, 0, ofm_channels], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
23ecc50
to
d069c27
Compare
One of the layout transforms currently causes the cascader to stripe across B16 axis (which is not allowed), so change that and deal with the implications to the get_valid_block_configs. Change-Id: I04199f9f35fcc31618581567483cfb80d3b5aad2
* Change the nhcwb16_to_nhwc matrix for binary and unary elementwise such that it matches the other NPU ops * Reduce the number of places where the same layout transform matrices are defined
d069c27
to
df32c58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @ekalda @NicolaLancellotti @lhutton1 |
* [microNPU] Fix layout transform matrix One of the layout transforms currently causes the cascader to stripe across B16 axis (which is not allowed), so change that and deal with the implications to the get_valid_block_configs. Change-Id: I04199f9f35fcc31618581567483cfb80d3b5aad2 * Reduce the duplication of layout transfrom matrices * Change the nhcwb16_to_nhwc matrix for binary and unary elementwise such that it matches the other NPU ops * Reduce the number of places where the same layout transform matrices are defined * Add documentation to the layout transform matrices
One of the layout transform matrices currently causes the cascader to stripe
across B16 axis (which is not allowed), so change that and deal with
the implications to the get_valid_block_configs.