[TOPI][bugfix] Fix a bug in arm_cpu int8 dotprod schedule and modernize tests #13669

ekalda · 2022-12-28T09:56:50Z

topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing compilation in case the input channels divided by 4 was less than 4.

This was because we were splitting this axis by a factor of 4 to create appropriate loop nest for tensorize, but then tensorize was assuming that the outer axis bound was divisible by 4.

If the outer bound was less than 4, compilation failed, if it was greater than 4 but not divisible by 4, we were occasionally accessing data outside of tensor, which luckily was padded due to alignment (I think).

So here we make sure that we explicitly pad the input axis such that the outer loop will always be divisible by 4.

There are also some refactors to test_topi_conv2d_int8.py:

decouple the tests using pytest.parametrize
extend the NHWC int8 schedules test to test against Arm targets and various schedules. When these schedules were initially added, we didn't have Arm CI, so only compilation was tested, now we can also run the workloads on Arm targets.

tvm-bot · 2022-12-28T09:56:53Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

No users to tag found in teams: topi, bugfix _{See #10317 for details}

_{Generated by tvm-bot}

ekalda · 2022-12-28T09:57:54Z

cc @leandron @Mousius

Mousius

Phew, this is quite an improvement to the testing 😸 if I got the logic correctly, I can understand why this happened at least - just have a few things to improve in the testing without trying to fix too much of it all at once 🙀

python/tvm/topi/nn/conv2d.py

tests/python/topi/python/test_topi_conv2d_int8.py

Mousius · 2022-12-28T13:54:02Z

tests/python/topi/python/test_topi_conv2d_int8.py

+            if not tvm.testing.device_enabled(target):
+                print("Skip because %s is not enabled" % target)
+                return
+            if target == "cuda" and not tvm.contrib.nvcc.have_int8(dev.compute_version):
+                print("Skip because int8 intrinsics are not available")
+                return


pytest.skip as above, can we re-use the same function by hoisting it out of the test?

I added the pytest.skip. I experimented with hoisting out the functions since all the tests in that file do something similar, but annoyingly the functions are all subtly different and depend on pretty much all the parameters passed to the test and defined in the test, also compute definitions, schedules, utility functions etc which would all need to be passed as arguments, so it didn't look like it was worth it.

tests/python/topi/python/test_topi_conv2d_int8.py

…ze tests topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing compilation in case the input channels divided by 4 was less than 4. This was because we were splitting this axis by a factor of 4 to create appropriate loop nest for tensorize, but then tensorize was assuming the outer axis bound was divisible by 4. If the outer bound was less than 4, compilation failed, if it was greater than 4 but not divisible by 4, we were occasionally accessing data outside of tensor, which luckily was padded due to alignment (I think). So here we make sure that we explicitly pad the input axis such that the outer loop will always be divisible by 4. There are also some refactors to test_topi_conv2d_int8.py: - decouple the tests using pytest.parametrize - extend the NHWC int8 schedules test to test against arm targets and various schedules. When these schedules were initialy added, we didn't have Arm CI, so only compilation was tested, now we can also run the workloads on Arm targets. Change-Id: Iba7db541d8fff54736dabc310a9657f18623e556

Mousius · 2022-12-29T18:40:50Z

LGTM @ekalda, thanks for making great strides improving the tests here 😸 I'll leave it open a little longer but otherwise I think this is good to go

…ze tests (apache#13669) topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing compilation in case the input channels divided by 4 was less than 4. This was because we were splitting this axis by a factor of 4 to create appropriate loop nest for tensorize, but then tensorize was assuming the outer axis bound was divisible by 4. If the outer bound was less than 4, compilation failed, if it was greater than 4 but not divisible by 4, we were occasionally accessing data outside of tensor, which luckily was padded due to alignment (I think). So here we make sure that we explicitly pad the input axis such that the outer loop will always be divisible by 4. There are also some refactors to test_topi_conv2d_int8.py: - decouple the tests using pytest.parametrize - extend the NHWC int8 schedules test to test against arm targets and various schedules. When these schedules were initialy added, we didn't have Arm CI, so only compilation was tested, now we can also run the workloads on Arm targets.

…ts matrix for arm_cpu NHWC quantized conv2d Fixed arm_cpu strategy bug which was causing tensorization errors when using the `AlterOpLayout` pass for the quantized NHWC conv2d schedules, as discovered in apache#10724. Therefore, we can now also enable the usage of `AlterOpLayout` for these schedules in order to transform the weight matrix at compile time, instead of runtime as before. I also modified the padding in `Conv2DGemmWeightTransformRel` and `interleave_transpose_weights` to reflect the changes made in apache#13669 and updated the AlterOpLayout tests accordingly.

…ts matrix for arm_cpu NHWC quantized conv2d (#15584) Fixed arm_cpu strategy bug which was causing tensorization errors when using the `AlterOpLayout` pass for the quantized NHWC conv2d schedules, as discovered in #10724. Therefore, we can now also enable the usage of `AlterOpLayout` for these schedules in order to transform the weight matrix at compile time, instead of runtime as before. I also modified the padding in `Conv2DGemmWeightTransformRel` and `interleave_transpose_weights` to reflect the changes made in #13669 and updated the AlterOpLayout tests accordingly.

Mousius requested changes Dec 28, 2022

View reviewed changes

ekalda added 3 commits December 29, 2022 11:36

Linting...

8e168f6

More testing cleanup

2f2ecaf

ekalda force-pushed the dotprod-schedule-bugfix branch from ac5b3c4 to 2f2ecaf Compare December 29, 2022 11:39

Mousius approved these changes Dec 29, 2022

View reviewed changes

Mousius merged commit 579c970 into apache:main Dec 31, 2022

ekalda deleted the dotprod-schedule-bugfix branch January 3, 2023 07:32

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

Anndrey24 mentioned this pull request Aug 17, 2023

[Bugfix][Relay][Strategy] Enable compile time transformation of weights matrix for arm_cpu NHWC quantized conv2d #15584

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI][bugfix] Fix a bug in arm_cpu int8 dotprod schedule and modernize tests #13669

[TOPI][bugfix] Fix a bug in arm_cpu int8 dotprod schedule and modernize tests #13669

ekalda commented Dec 28, 2022 •

edited

Loading

tvm-bot commented Dec 28, 2022

ekalda commented Dec 28, 2022

Mousius left a comment

Mousius Dec 28, 2022

ekalda Dec 29, 2022

Mousius commented Dec 29, 2022

[TOPI][bugfix] Fix a bug in arm_cpu int8 dotprod schedule and modernize tests #13669

[TOPI][bugfix] Fix a bug in arm_cpu int8 dotprod schedule and modernize tests #13669

Conversation

ekalda commented Dec 28, 2022 • edited Loading

tvm-bot commented Dec 28, 2022

ekalda commented Dec 28, 2022

Mousius left a comment

Choose a reason for hiding this comment

Mousius Dec 28, 2022

Choose a reason for hiding this comment

ekalda Dec 29, 2022

Choose a reason for hiding this comment

Mousius commented Dec 29, 2022

ekalda commented Dec 28, 2022 •

edited

Loading