-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect avx2 detection #2957
Comments
That function only checks if the operating system has disabled avx2 support(which could happen in a vm for example), not if the cpu itself is capable. Can you provide more details for both failure cases please ? (In the case of the user reports, model names or ideally /proc/cpuinfo contents would be necessary - I guess Intel may have marketed some dies with broken or otherwise disabled avx2 units under a haswell-like cpuid that OpenBLAS erroneously lumps in with the real thing) |
See archlinux forum here: https://aur.archlinux.org/packages/openblas-lapack/ There are two cpus which show HAVE_AVX2=1, even though the cpus shouldn't support it. I really think testing ebx & (1 << 5) is the right thing to do, that's what other people are doing see here for instance: https://github.com/google/highwayhash/blob/master/highwayhash/instruction_sets.cc#L105. I've asked the users to test out this fix on their cpu. |
Fairly certain we need to test both the cpu and os capability bit. (Pretty sure there was an old ticket for this which led to the addition of that check, but too tired to search now.) |
From what I can tell, maybe you're referring to this commit when you talk about os support?
But I can't figure out what the check ebx & (1<<7) is doing in
This is your standard sandybridge processor, but this is the output of ./getarch 0:
note that HAVE_AVX2 is set to 1, which makes no sense, since this cpu doesn't have the avx2 instructions, OS support or no OS support. Now because of our old friend here: https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.x86_64#L69, Now if I check ebx against 1<<5, then everything works, support_avx2 returns 0 and everything compiles. I've even found a cpu with the opposite situation. It's a ryzen cpu running in a virtualbox, which lets through the avx2 flag. See the /proc/cpuinfo below:
However, ./get_arch 0 doesn't detect avx2, which again is fixed by checking the fifth bit. However, I'm not sure what the universal intrinsics really need, but for instance virtualbox doesn't let through the bmi and bmi2 flag which are also part of the instructions (that's what 1<<3 and 1<<8 test), so it fails if we add those. |
And one more data point: https://github.com/xianyi/OpenBLAS/blob/develop/cpuid_x86.c#L238 This lines actually checks ebx against 32 = 1<<5, not 1<<7. So I really think 1<<7 is incorrect. |
#1949 it was - apparently I did not see the full picture back then (nor did the others). The check was meant to disable AVX2 if necessary after it was already inferred from the cpuid - which for family 6, exmodel 3, model A it certainly is not. (And it seems I had actually started out with a cpu capabilty check and changed it to the os flag when that was pointed out to me - the way this |
This is a followup to issue #2933. I'm getting reports from users still having issues to compile after the fix that went in 3.12. Their cpu doesn't support avx2, yet HAVE_AVX2=1 is generated in Makefile.conf. I have the opposite situation: my cpu has the avx2 flag, yet
./get_arch 0
will set HAVE_AVX2 flag to 0. I'm starting to wonder if thesupport_avx2
function defined here is correct. Based on intel docs https://software.intel.com/content/www/us/en/develop/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family.html, I would expect to test something like ebx & (1 <<3 ) & (1 << 5) & (1 << 8) instead of ebx & (1 <<7). gcc also has some intrinsics for testing cpu features which might be easier to use.The text was updated successfully, but these errors were encountered: