-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: #143 Inline ASM for detecting CPU features on ARM #196
Feat: #143 Inline ASM for detecting CPU features on ARM #196
Conversation
Use the functionality from SimSIMD to detect CPU features on ARM On macOS, user space code cannot access the mrs/msr registers, so we stick to the sysctl interface to probe for supported SIMD features
include/stringzilla/stringzilla.h
Outdated
sz_cap_sve2_k = 1 << 18, ///< ARM SVE2 capability | ||
sz_cap_sve2p1_k = 1 << 19, ///< ARM SVE2p1 capability | ||
|
||
sz_cap_x86_avx2_k = 1 << 21, /// x86 AVX2 capability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the old enum values is a breaking change that we should avoid.
c/lib.c
Outdated
// This is how the `arm-cpusysregs` library does it: | ||
// | ||
// int ID_AA64ISAR1_EL1_BF16() const { return (int)(_aa64isar1 >> 44) & 0x0F; } | ||
// int ID_AA64ZFR0_EL1_BF16() const { return (int)(_aa64zfr0 >> 20) & 0x0F; } | ||
// int ID_AA64PFR0_EL1_FP() const { return (int)(_aa64pfr0 >> 16) & 0x0F; } | ||
// int ID_AA64ISAR0_EL1_DP() const { return (int)(_aa64isar0 >> 44) & 0x0F; } | ||
// int ID_AA64PFR0_EL1_SVE() const { return (int)(_aa64pfr0 >> 32) & 0x0F; } | ||
// int ID_AA64ZFR0_EL1_SVEver() const { return (int)(_aa64zfr0) & 0x0F; } | ||
// bool FEAT_BF16() const { return ID_AA64ISAR1_EL1_BF16() >= 1 || ID_AA64ZFR0_EL1_BF16() >= 1; } | ||
// bool FEAT_FP16() const { return ID_AA64PFR0_EL1_FP() >= 1 && ID_AA64PFR0_EL1_FP() < 15; } | ||
// bool FEAT_DotProd() const { return ID_AA64ISAR0_EL1_DP() >= 1; } | ||
// bool FEAT_SVE() const { return ID_AA64PFR0_EL1_SVE() >= 1; } | ||
// bool FEAT_SVE2() const { return ID_AA64ZFR0_EL1_SVEver() >= 1; } | ||
// bool FEAT_I8MM() const { return ID_AA64ZFR0_EL1_I8MM() >= 1; } | ||
// | ||
// https://github.com/lelegard/arm-cpusysregs/tree/4837c62e619a5e5f12bf41b16a1ee1e71d62c76d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid this noise. I've removed it on the main-elementwise
branch of SimSIMD as well 🤗
include/stringzilla/stringzilla.h
Outdated
sz_cap_neon_f16_k = 1 << 11, ///< ARM NEON `f16` capability | ||
sz_cap_neon_bf16_k = 1 << 12, ///< ARM NEON `bf16` capability | ||
sz_cap_neon_i8_k = 1 << 13, ///< ARM NEON `i8` capability | ||
sz_cap_sve_k = 1 << 14, ///< ARM SVE capability TODO: Not yet supported or used | ||
sz_cap_sve_f16_k = 1 << 15, ///< ARM SVE `f16` capability | ||
sz_cap_sve_bf16_k = 1 << 16, ///< ARM SVE `bf16` capability | ||
sz_cap_sve_i8_k = 1 << 17, ///< ARM SVE `i8` capability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f16
, bf16
and i8mm
will just pollute the space. Let's stick to NEON, SVE, and SVE2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will remove capability checks for f16/bf16 and i8mm
c/lib.c
Outdated
// On Apple Silicon, `mrs` is not allowed in user-space, so we need to use the `sysctl` API. | ||
uint32_t supports_neon = 0, supports_fp16 = 0, supports_bf16 = 0, supports_i8mm = 0; | ||
size_t size = sizeof(supports_neon); | ||
if (sysctlbyname("hw.optional.neon", &supports_neon, &size, NULL, 0) != 0) supports_neon = 0; | ||
if (sysctlbyname("hw.optional.arm.FEAT_FP16", &supports_fp16, &size, NULL, 0) != 0) supports_fp16 = 0; | ||
if (sysctlbyname("hw.optional.arm.FEAT_BF16", &supports_bf16, &size, NULL, 0) != 0) supports_bf16 = 0; | ||
if (sysctlbyname("hw.optional.arm.FEAT_I8MM", &supports_i8mm, &size, NULL, 0) != 0) supports_i8mm = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new M4 macs should support SVE and SVE2, but I don't know hot to check this. Do you have recent Apple hardware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sadly, I don't have access to the M4s yet :( I can check with friends or colleagues to see if
they do and can get the right sysctl property names for SVE/SVE2 checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an M4 iPad, we can probably try porting the SwiftSemanticSearch demo to iPad, integrating SimSIMD and testing on device. Can try next week.
Do not break backward compatibility when adding capability checks in the enum Do not check for bf16, fp16 or i8 support as we don't use them yet in stringzilla Remove documentation for how the MSR are accessed, instead just link to the detailed documentation in SimSIMD
Use the functionality from SimSIMD to detect CPU features on ARM On macOS, user space code cannot access the mrs/msr registers, so we stick to the sysctl interface to probe for supported SIMD features
I am not sure if checking for
f16
orbf16
values are useful for Strings yet, nevertheless, it didn't seem to me that they are harmful either and it might be not a bad idea to expose capabilities similar to SimSIMD for checking CPU arch capabilities.