-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ABI for __m256
and __m512
is wrong when avx
/avx512
is disabled globally, or enabled per-function
#64706
Comments
@llvm/issue-subscribers-backend-x86 |
It's unfortunate LLVM has gap with GCC. But I'd argue passing |
So the ABI is different depending on whether the target feature is enabled globally vs in the current function? Ugh, that's a pretty critical bug, isn't it? The case where the target features are missing is somewhat odd and maybe shouldn't even be accepted, but the case where the target feature is present for the current function should obviously work. |
Yes, the varying ABI is really troublesome. I dug into this with some coworkers earlier this year (for 256-bit vectors), and just recently shared our findings/thoughts here: https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi/?utm_source=share&utm_medium=web2x&context=3 TL;DR: llvm and gcc behave differently, and llvm's handling of the target attribute is only useful today if you write all of your vector code inside a giant loop with locals. The ABI issue means that it's dramatically better to continue compiling multiple versions with separate TUs (and separate flags). |
This will allow us to have separate -march= flags for Haswell and Skylake pixel-conversion loops, which is important to dodge a Clang bug (llvm/llvm-project#64706) Bug: b/310927123 Change-Id: I828f0cf591a1c4d0a6260fef786c80e95b38ead7 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/5038351 Auto-Submit: John Stiles <johnstiles@google.com> Reviewed-by: Brian Osman <brianosman@google.com> Commit-Queue: Leon Scroggins <scroggo@google.com> Reviewed-by: Leon Scroggins <scroggo@google.com> Cr-Commit-Position: refs/heads/main@{#1226103}
This warning is triggered when one accepts or returns vectors from a function (that is not marked with [[gnu::target(...)]]) which would have been otherwise passed in register if the current translation unit had been compiled with more permissive flags wrt instruction selection (i. e. if one adds -mavx2 to cmdline). This will never be a problem for us since we (a) never use different instruction selection options across ABI boundaries; (b) most of the affected functions are actually TU-local. Moreover, even if we somehow properly annotated all of the SIMD helpers, calling them across ABI (or target) boundaries would still be very dangerous because of inconsistent and bogus handling of [[gnu::target(...)]] across compilers. See llvm/llvm-project#64706 and https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi .
This warning is triggered when one accepts or returns vectors from a function (that is not marked with [[gnu::target(...)]]) which would have been otherwise passed in register if the current translation unit had been compiled with more permissive flags wrt instruction selection (i. e. if one adds -mavx2 to cmdline). This will never be a problem for us since we (a) never use different instruction selection options across ABI boundaries; (b) most of the affected functions are actually TU-local. Moreover, even if we somehow properly annotated all of the SIMD helpers, calling them across ABI (or target) boundaries would still be very dangerous because of inconsistent and bogus handling of [[gnu::target(...)]] across compilers. See llvm/llvm-project#64706 and https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi .
This warning is triggered when one accepts or returns vectors from a function (that is not marked with [[gnu::target(...)]]) which would have been otherwise passed in register if the current translation unit had been compiled with more permissive flags wrt instruction selection (i. e. if one adds -mavx2 to cmdline). This will never be a problem for us since we (a) never use different instruction selection options across ABI boundaries; (b) most of the affected functions are actually TU-local. Moreover, even if we somehow properly annotated all of the SIMD helpers, calling them across ABI (or target) boundaries would still be very dangerous because of inconsistent and bogus handling of [[gnu::target(...)]] across compilers. See llvm/llvm-project#64706 and https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi .
This warning is triggered when one accepts or returns vectors from a function (that is not marked with [[gnu::target(...)]]) which would have been otherwise passed in register if the current translation unit had been compiled with more permissive flags wrt instruction selection (i. e. if one adds -mavx2 to cmdline). This will never be a problem for us since we (a) never use different instruction selection options across ABI boundaries; (b) most of the affected functions are actually TU-local. Moreover, even if we somehow properly annotated all of the SIMD helpers, calling them across ABI (or target) boundaries would still be very dangerous because of inconsistent and bogus handling of [[gnu::target(...)]] across compilers. See llvm/llvm-project#64706 and https://www.reddit.com/r/cpp/comments/17qowl2/comment/k8j2odi .
Based on this discussion: https://groups.google.com/g/x86-64-abi/c/FMhl2vDl1D8
Currently, llvm passes
__m256
and__m512
parameters/return values when it cannot use ymm/zmm registers as follows:xmm
registers.Further, when the avx/avx512f features are enabled at the function level (not globally, using
__attribute__((target))
), it passes parameters/return values:ymm
/zmm
register.In contrast the behaviour of gcc (which is apparantly the correct behaviour in both cases) is:
When ymm/zmm registers are unavailable:
When ymm/zmm registers are available at the function level (using
__attribute__((target))
), it passes and returns values as it does when the feature is available globally via a-m
flag.The difference in behaviour can be demonstrated by https://godbolt.org/z/8sYcn6654.
Based on a short discussion on the x86-64 psABI mailing list, this appears to be entirely incorrect on behalf of llvm: When returning w/o the registers available, it must return in memory as the ABI requires it to place the 2nd SSEUP eightbyte in the 3rd eightbyte of
xmm0
, which fails, and sends the entire value to memory. In the locally-enabled case, the registers are available, so it should be passing fully inymm1
and returning fully inymm0
(llvm seems to think that it is available given that it does return inymm0
).The text was updated successfully, but these errors were encountered: