-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable _tzcnt_u32/_tzcnt_u64 for ARM64EC #6257
Conversation
ARM64EC doesn't support AVX intrinsics including _tzcnt_u32/_tzcnt_u64, which are mapped to AVX2 instruction tzcnt. Therefore, neuter such intrinsics for ARM64EC compilation.
Would it make sense to change the logic so that this code only happens for the right processors instead of trying to flag all of the ones where it shouldn't, which is more fragile? If so, then we could also add sections for the ARM64 platforms with MSVC to use the right intrinsics there (https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170). |
Arm/arm64 doesn’t have same similar intrinsics now, so it’s not possible to check arm64 cpu capability and use intrinsics. |
ah, right, we only look for trailing zeros, not leading. The other part is valid though, we can change:
to check for And similar for the other one ... |
But now I also see that both x86_64 and arm64 can be defined at once in https://techcommunity.microsoft.com/t5/windows-kernel-internals-blog/getting-to-know-arm64ec-defines-and-intrinsic-functions/ba-p/2957235 ... It just seems like we can make this a bit less fragile here. |
Right |
@@ -50,7 +50,7 @@ Revision History: | |||
|
|||
#if defined(__GNUC__) | |||
#define _trailing_zeros32(X) __builtin_ctz(X) | |||
#elif defined(_WINDOWS) && !defined(_M_ARM) && !defined(_M_ARM64) && !defined(__MINGW32__) | |||
#elif defined(_WINDOWS) && !defined(_M_ARM) && !defined(_M_ARM64) && !defined(_M_ARM64EC) && !defined(__MINGW32__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be defined(_WINDOWS) && (defined(_X86/64) || ...)
@@ -62,7 +62,7 @@ static uint32_t _trailing_zeros32(uint32_t x) { | |||
} | |||
#endif | |||
|
|||
#if (defined(__LP64__) || defined(_WIN64)) && !defined(_M_ARM) && !defined(_M_ARM64) | |||
#if (defined(__LP64__) || defined(_WIN64)) && !defined(_M_ARM) && !defined(_M_ARM64) && !defined(_M_ARM64EC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I would then set the architecture conservatively.
I think I should set the architecture conservatively to handle this bug/pull |
Note that in ARM64EC compilation, both the _M_X64 and _M_ARM64EC macros are defined. To really only limit the scope of the tzcnt intrinsics, you need to check (defined(_M_X86) || defined(_M_X64) && !defined(_M_ARM64EC)) if you want a more conservative change. Thanks. |
ARM64EC doesn't support AVX intrinsics including _tzcnt_u32/_tzcnt_u64, which are mapped to AVX2 instruction tzcnt. Therefore, neuter such intrinsics for ARM64EC compilation.