Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: #143 Inline ASM for detecting CPU features on ARM #196

Merged

Conversation

GoWind
Copy link

@GoWind GoWind commented Nov 22, 2024

Use the functionality from SimSIMD to detect CPU features on ARM On macOS, user space code cannot access the mrs/msr registers, so we stick to the sysctl interface to probe for supported SIMD features

I am not sure if checking for f16 or bf16 values are useful for Strings yet, nevertheless, it didn't seem to me that they are harmful either and it might be not a bad idea to expose capabilities similar to SimSIMD for checking CPU arch capabilities.

Use the functionality from SimSIMD to detect CPU features on ARM
On macOS, user space code cannot access the mrs/msr registers, so we stick to the
sysctl interface to probe for supported SIMD features
sz_cap_sve2_k = 1 << 18, ///< ARM SVE2 capability
sz_cap_sve2p1_k = 1 << 19, ///< ARM SVE2p1 capability

sz_cap_x86_avx2_k = 1 << 21, /// x86 AVX2 capability
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the old enum values is a breaking change that we should avoid.

c/lib.c Outdated
Comment on lines 70 to 85
// This is how the `arm-cpusysregs` library does it:
//
// int ID_AA64ISAR1_EL1_BF16() const { return (int)(_aa64isar1 >> 44) & 0x0F; }
// int ID_AA64ZFR0_EL1_BF16() const { return (int)(_aa64zfr0 >> 20) & 0x0F; }
// int ID_AA64PFR0_EL1_FP() const { return (int)(_aa64pfr0 >> 16) & 0x0F; }
// int ID_AA64ISAR0_EL1_DP() const { return (int)(_aa64isar0 >> 44) & 0x0F; }
// int ID_AA64PFR0_EL1_SVE() const { return (int)(_aa64pfr0 >> 32) & 0x0F; }
// int ID_AA64ZFR0_EL1_SVEver() const { return (int)(_aa64zfr0) & 0x0F; }
// bool FEAT_BF16() const { return ID_AA64ISAR1_EL1_BF16() >= 1 || ID_AA64ZFR0_EL1_BF16() >= 1; }
// bool FEAT_FP16() const { return ID_AA64PFR0_EL1_FP() >= 1 && ID_AA64PFR0_EL1_FP() < 15; }
// bool FEAT_DotProd() const { return ID_AA64ISAR0_EL1_DP() >= 1; }
// bool FEAT_SVE() const { return ID_AA64PFR0_EL1_SVE() >= 1; }
// bool FEAT_SVE2() const { return ID_AA64ZFR0_EL1_SVEver() >= 1; }
// bool FEAT_I8MM() const { return ID_AA64ZFR0_EL1_I8MM() >= 1; }
//
// https://github.com/lelegard/arm-cpusysregs/tree/4837c62e619a5e5f12bf41b16a1ee1e71d62c76d
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid this noise. I've removed it on the main-elementwise branch of SimSIMD as well 🤗

Comment on lines 262 to 268
sz_cap_neon_f16_k = 1 << 11, ///< ARM NEON `f16` capability
sz_cap_neon_bf16_k = 1 << 12, ///< ARM NEON `bf16` capability
sz_cap_neon_i8_k = 1 << 13, ///< ARM NEON `i8` capability
sz_cap_sve_k = 1 << 14, ///< ARM SVE capability TODO: Not yet supported or used
sz_cap_sve_f16_k = 1 << 15, ///< ARM SVE `f16` capability
sz_cap_sve_bf16_k = 1 << 16, ///< ARM SVE `bf16` capability
sz_cap_sve_i8_k = 1 << 17, ///< ARM SVE `i8` capability
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f16, bf16 and i8mm will just pollute the space. Let's stick to NEON, SVE, and SVE2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will remove capability checks for f16/bf16 and i8mm

c/lib.c Outdated
Comment on lines 54 to 60
// On Apple Silicon, `mrs` is not allowed in user-space, so we need to use the `sysctl` API.
uint32_t supports_neon = 0, supports_fp16 = 0, supports_bf16 = 0, supports_i8mm = 0;
size_t size = sizeof(supports_neon);
if (sysctlbyname("hw.optional.neon", &supports_neon, &size, NULL, 0) != 0) supports_neon = 0;
if (sysctlbyname("hw.optional.arm.FEAT_FP16", &supports_fp16, &size, NULL, 0) != 0) supports_fp16 = 0;
if (sysctlbyname("hw.optional.arm.FEAT_BF16", &supports_bf16, &size, NULL, 0) != 0) supports_bf16 = 0;
if (sysctlbyname("hw.optional.arm.FEAT_I8MM", &supports_i8mm, &size, NULL, 0) != 0) supports_i8mm = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new M4 macs should support SVE and SVE2, but I don't know hot to check this. Do you have recent Apple hardware?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sadly, I don't have access to the M4s yet :( I can check with friends or colleagues to see if
they do and can get the right sysctl property names for SVE/SVE2 checks

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an M4 iPad, we can probably try porting the SwiftSemanticSearch demo to iPad, integrating SimSIMD and testing on device. Can try next week.

Do not break backward compatibility when adding capability checks in the
enum
Do not check for bf16, fp16 or i8 support as we don't use them yet in stringzilla
Remove documentation for how the MSR are accessed, instead just link to the
detailed documentation in SimSIMD
@ashvardanian ashvardanian changed the base branch from main to main-dev December 7, 2024 11:02
@ashvardanian ashvardanian merged commit 0ee549a into ashvardanian:main-dev Dec 7, 2024
@ashvardanian ashvardanian mentioned this pull request Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants