[Hexagon] Implement fixed_point_multiply op through intrinsics. #12659

ibsidorenko · 2022-08-31T10:06:46Z

This commit adds high-performance implementation of fixed_point_multiply
operation based on Hexagon intrinsics for vmpye/vmpyo instructions.

Benchmarking of fixed_point_multiply op with (1,8,56,56,32) input
tensor on Qualcomm SM8350:

default implementation: 10.06 ms
optimized implementation: 1.42 ms
speedup: 7x times (!!!)

Please note that this is introducing a small round-up error for some
corner cases with negative shift argument (The same as for ARM CPU, see
PR#5980). This is because we are rounding twice instead than only once:

original q_multiply_shift: round(xy2^-s)
hexagon q_multiply_shift: round(round(x*y)*2^-s)

cc @mehrdadh

This commit adds high-performance implementation of fixed_point_multiply operation based on Hexagon intrinsics for vmpye/vmpyo instructions. Benchmarking of 'fixed_point_multiply' op with (1,8,56,56,32) input tensor on Qualcomm SM8350: * default implementation: 10.06 ms * optimized implementation: 1.42 ms * speedup: 7x times (!!!) Please note that this is introducing a small round-up error for some corner cases with negative shift argument (The same as for ARM CPU, see PR#5980). This is because we are rounding twice instead than only once: * original q_multiply_shift: round(x*y*2^-s) * hexagon q_multiply_shift: round(round(x*y)*2^-s)

masahi · 2022-08-31T10:42:59Z

cc @kparzysz-quic

tmoreau89 · 2022-08-31T18:43:56Z

Excellent, thank you for the contribution @ibsidorenko !

kparzysz-quic · 2022-08-31T20:16:02Z

python/tvm/topi/hexagon/tensor_intrin.py

+    )
+
+    # Select depending on the shift
+    return tvm.tir.Select(shift < 0, out_negative_shift, mul_o_2)


The mul_o_2 is just round(x*y). There is no shift in it.

hm... Maybe I misunderstood the question, but I put shift separately before mul_o_2:
x = x * (1 << (shift))

Sorry, I misread it as a part of the comment...

kparzysz-quic

Looks great! Thanks.

…he#12659) This commit adds high-performance implementation of fixed_point_multiply operation based on Hexagon intrinsics for vmpye/vmpyo instructions. Benchmarking of 'fixed_point_multiply' op with (1,8,56,56,32) input tensor on Qualcomm SM8350: * default implementation: 10.06 ms * optimized implementation: 1.42 ms * speedup: 7x times (!!!) Please note that this is introducing a small round-up error for some corner cases with negative shift argument (The same as for ARM CPU, see PR#5980). This is because we are rounding twice instead than only once: * original q_multiply_shift: round(x*y*2^-s) * hexagon q_multiply_shift: round(round(x*y)*2^-s)

kparzysz-quic reviewed Aug 31, 2022

View reviewed changes

kparzysz-quic approved these changes Sep 1, 2022

View reviewed changes

kparzysz-quic merged commit 038f15b into apache:main Sep 1, 2022

AndrewZhaoLuo mentioned this pull request Oct 4, 2022

TVM v0.10.0.rc0 Release Candidate Notes #12979

Closed

ibsidorenko deleted the hexagon-fixed-point-multiply branch March 29, 2023 06:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hexagon] Implement fixed_point_multiply op through intrinsics. #12659

[Hexagon] Implement fixed_point_multiply op through intrinsics. #12659

ibsidorenko commented Aug 31, 2022 •

edited by github-actions bot

Loading

masahi commented Aug 31, 2022

tmoreau89 commented Aug 31, 2022

kparzysz-quic Aug 31, 2022

ibsidorenko Sep 1, 2022

kparzysz-quic Sep 1, 2022

kparzysz-quic left a comment

[Hexagon] Implement fixed_point_multiply op through intrinsics. #12659

[Hexagon] Implement fixed_point_multiply op through intrinsics. #12659

Conversation

ibsidorenko commented Aug 31, 2022 • edited by github-actions bot Loading

masahi commented Aug 31, 2022

tmoreau89 commented Aug 31, 2022

kparzysz-quic Aug 31, 2022

Choose a reason for hiding this comment

ibsidorenko Sep 1, 2022

Choose a reason for hiding this comment

kparzysz-quic Sep 1, 2022

Choose a reason for hiding this comment

kparzysz-quic left a comment

Choose a reason for hiding this comment

ibsidorenko commented Aug 31, 2022 •

edited by github-actions bot

Loading