Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOPI][bugfix] Fix a bug in arm_cpu int8 dotprod schedule and modernize tests #13669

Merged
merged 3 commits into from
Dec 31, 2022

Conversation

ekalda
Copy link
Contributor

@ekalda ekalda commented Dec 28, 2022

topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing compilation in case the input channels divided by 4 was less than 4.

This was because we were splitting this axis by a factor of 4 to create appropriate loop nest for tensorize, but then tensorize was assuming that the outer axis bound was divisible by 4.

If the outer bound was less than 4, compilation failed, if it was greater than 4 but not divisible by 4, we were occasionally accessing data outside of tensor, which luckily was padded due to alignment (I think).

So here we make sure that we explicitly pad the input axis such that the outer loop will always be divisible by 4.

There are also some refactors to test_topi_conv2d_int8.py:

  • decouple the tests using pytest.parametrize
  • extend the NHWC int8 schedules test to test against Arm targets and various schedules. When these schedules were initially added, we didn't have Arm CI, so only compilation was tested, now we can also run the workloads on Arm targets.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Dec 28, 2022

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: topi, bugfix See #10317 for details

Generated by tvm-bot

@ekalda
Copy link
Contributor Author

ekalda commented Dec 28, 2022

cc @leandron @Mousius

Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew, this is quite an improvement to the testing 😸 if I got the logic correctly, I can understand why this happened at least - just have a few things to improve in the testing without trying to fix too much of it all at once 🙀

python/tvm/topi/nn/conv2d.py Outdated Show resolved Hide resolved
tests/python/topi/python/test_topi_conv2d_int8.py Outdated Show resolved Hide resolved
tests/python/topi/python/test_topi_conv2d_int8.py Outdated Show resolved Hide resolved
tests/python/topi/python/test_topi_conv2d_int8.py Outdated Show resolved Hide resolved
tests/python/topi/python/test_topi_conv2d_int8.py Outdated Show resolved Hide resolved
Comment on lines 570 to 575
if not tvm.testing.device_enabled(target):
print("Skip because %s is not enabled" % target)
return
if target == "cuda" and not tvm.contrib.nvcc.have_int8(dev.compute_version):
print("Skip because int8 intrinsics are not available")
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest.skip as above, can we re-use the same function by hoisting it out of the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the pytest.skip. I experimented with hoisting out the functions since all the tests in that file do something similar, but annoyingly the functions are all subtly different and depend on pretty much all the parameters passed to the test and defined in the test, also compute definitions, schedules, utility functions etc which would all need to be passed as arguments, so it didn't look like it was worth it.

tests/python/topi/python/test_topi_conv2d_int8.py Outdated Show resolved Hide resolved
tests/python/topi/python/test_topi_conv2d_int8.py Outdated Show resolved Hide resolved
…ze tests

topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing
compilation in case the input channels divided by 4 was less than 4.

This was because we were splitting this axis by a factor of 4 to create
appropriate loop nest for tensorize, but then tensorize was assuming
the outer axis bound was divisible by 4.

If the outer bound was less than 4, compilation failed, if it was
greater than 4 but not divisible by 4, we were occasionally
accessing data outside of tensor, which luckily was padded due to
alignment (I think).

So here we make sure that we explicitly pad the input axis such that
the outer loop will always be divisible by 4.

There are also some refactors to test_topi_conv2d_int8.py:
- decouple the tests using pytest.parametrize
- extend the NHWC int8 schedules test to test against arm
targets and various schedules. When these schedules were initialy
added, we didn't have Arm CI, so only compilation was tested, now
we can also run the workloads on Arm targets.

Change-Id: Iba7db541d8fff54736dabc310a9657f18623e556
@ekalda ekalda force-pushed the dotprod-schedule-bugfix branch from ac5b3c4 to 2f2ecaf Compare December 29, 2022 11:39
@Mousius
Copy link
Member

Mousius commented Dec 29, 2022

LGTM @ekalda, thanks for making great strides improving the tests here 😸 I'll leave it open a little longer but otherwise I think this is good to go

@Mousius Mousius merged commit 579c970 into apache:main Dec 31, 2022
@ekalda ekalda deleted the dotprod-schedule-bugfix branch January 3, 2023 07:32
fzi-peccia pushed a commit to fzi-peccia/tvm that referenced this pull request Mar 27, 2023
…ze tests (apache#13669)

topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing
compilation in case the input channels divided by 4 was less than 4.

This was because we were splitting this axis by a factor of 4 to create
appropriate loop nest for tensorize, but then tensorize was assuming
the outer axis bound was divisible by 4.

If the outer bound was less than 4, compilation failed, if it was
greater than 4 but not divisible by 4, we were occasionally
accessing data outside of tensor, which luckily was padded due to
alignment (I think).

So here we make sure that we explicitly pad the input axis such that
the outer loop will always be divisible by 4.

There are also some refactors to test_topi_conv2d_int8.py:
- decouple the tests using pytest.parametrize
- extend the NHWC int8 schedules test to test against arm
targets and various schedules. When these schedules were initialy
added, we didn't have Arm CI, so only compilation was tested, now
we can also run the workloads on Arm targets.
Anndrey24 added a commit to Anndrey24/tvm that referenced this pull request Aug 17, 2023
…ts matrix for arm_cpu NHWC quantized conv2d

Fixed arm_cpu strategy bug which was causing tensorization errors when using the `AlterOpLayout` pass for the quantized NHWC conv2d schedules, as discovered in apache#10724. Therefore, we can now also enable the usage of `AlterOpLayout` for these schedules in order to transform the weight matrix at compile time, instead of runtime as before.
I also modified the padding in `Conv2DGemmWeightTransformRel` and `interleave_transpose_weights` to reflect the changes made in apache#13669 and updated the AlterOpLayout tests accordingly.
lhutton1 pushed a commit that referenced this pull request Aug 23, 2023
…ts matrix for arm_cpu NHWC quantized conv2d (#15584)

Fixed arm_cpu strategy bug which was causing tensorization errors when using the `AlterOpLayout` pass for the quantized NHWC conv2d schedules, as discovered in #10724. Therefore, we can now also enable the usage of `AlterOpLayout` for these schedules in order to transform the weight matrix at compile time, instead of runtime as before.
I also modified the padding in `Conv2DGemmWeightTransformRel` and `interleave_transpose_weights` to reflect the changes made in #13669 and updated the AlterOpLayout tests accordingly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants