-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI][bugfix] Fix a bug in arm_cpu int8 dotprod schedule and modernize tests #13669
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew, this is quite an improvement to the testing 😸 if I got the logic correctly, I can understand why this happened at least - just have a few things to improve in the testing without trying to fix too much of it all at once 🙀
if not tvm.testing.device_enabled(target): | ||
print("Skip because %s is not enabled" % target) | ||
return | ||
if target == "cuda" and not tvm.contrib.nvcc.have_int8(dev.compute_version): | ||
print("Skip because int8 intrinsics are not available") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest.skip
as above, can we re-use the same function by hoisting it out of the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the pytest.skip
. I experimented with hoisting out the functions since all the tests in that file do something similar, but annoyingly the functions are all subtly different and depend on pretty much all the parameters passed to the test and defined in the test, also compute definitions, schedules, utility functions etc which would all need to be passed as arguments, so it didn't look like it was worth it.
…ze tests topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing compilation in case the input channels divided by 4 was less than 4. This was because we were splitting this axis by a factor of 4 to create appropriate loop nest for tensorize, but then tensorize was assuming the outer axis bound was divisible by 4. If the outer bound was less than 4, compilation failed, if it was greater than 4 but not divisible by 4, we were occasionally accessing data outside of tensor, which luckily was padded due to alignment (I think). So here we make sure that we explicitly pad the input axis such that the outer loop will always be divisible by 4. There are also some refactors to test_topi_conv2d_int8.py: - decouple the tests using pytest.parametrize - extend the NHWC int8 schedules test to test against arm targets and various schedules. When these schedules were initialy added, we didn't have Arm CI, so only compilation was tested, now we can also run the workloads on Arm targets. Change-Id: Iba7db541d8fff54736dabc310a9657f18623e556
ac5b3c4
to
2f2ecaf
Compare
LGTM @ekalda, thanks for making great strides improving the tests here 😸 I'll leave it open a little longer but otherwise I think this is good to go |
…ze tests (apache#13669) topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing compilation in case the input channels divided by 4 was less than 4. This was because we were splitting this axis by a factor of 4 to create appropriate loop nest for tensorize, but then tensorize was assuming the outer axis bound was divisible by 4. If the outer bound was less than 4, compilation failed, if it was greater than 4 but not divisible by 4, we were occasionally accessing data outside of tensor, which luckily was padded due to alignment (I think). So here we make sure that we explicitly pad the input axis such that the outer loop will always be divisible by 4. There are also some refactors to test_topi_conv2d_int8.py: - decouple the tests using pytest.parametrize - extend the NHWC int8 schedules test to test against arm targets and various schedules. When these schedules were initialy added, we didn't have Arm CI, so only compilation was tested, now we can also run the workloads on Arm targets.
…ts matrix for arm_cpu NHWC quantized conv2d Fixed arm_cpu strategy bug which was causing tensorization errors when using the `AlterOpLayout` pass for the quantized NHWC conv2d schedules, as discovered in apache#10724. Therefore, we can now also enable the usage of `AlterOpLayout` for these schedules in order to transform the weight matrix at compile time, instead of runtime as before. I also modified the padding in `Conv2DGemmWeightTransformRel` and `interleave_transpose_weights` to reflect the changes made in apache#13669 and updated the AlterOpLayout tests accordingly.
…ts matrix for arm_cpu NHWC quantized conv2d (#15584) Fixed arm_cpu strategy bug which was causing tensorization errors when using the `AlterOpLayout` pass for the quantized NHWC conv2d schedules, as discovered in #10724. Therefore, we can now also enable the usage of `AlterOpLayout` for these schedules in order to transform the weight matrix at compile time, instead of runtime as before. I also modified the padding in `Conv2DGemmWeightTransformRel` and `interleave_transpose_weights` to reflect the changes made in #13669 and updated the AlterOpLayout tests accordingly.
topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing compilation in case the input channels divided by 4 was less than 4.
This was because we were splitting this axis by a factor of 4 to create appropriate loop nest for tensorize, but then tensorize was assuming that the outer axis bound was divisible by 4.
If the outer bound was less than 4, compilation failed, if it was greater than 4 but not divisible by 4, we were occasionally accessing data outside of tensor, which luckily was padded due to alignment (I think).
So here we make sure that we explicitly pad the input axis such that the outer loop will always be divisible by 4.
There are also some refactors to test_topi_conv2d_int8.py: