-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible incorrect AVX512 detection #1947
Comments
Oh, just realize that I didn't put a description of the issue here so just to summarize the issue so that ppl don't have to read through the julia issue thread. The OpenBLAS avx512 detection code seems to be purely based on processor part number. While this is probably necessary for figuring out the performance model, it's not enough to guarantee the support of the AVX512 feature. Similar to AVX, both the CPUID feature bit and the zmm state bit in XCR0 needs to be checked to make sure both the processor and the OS supports AVX512. If I'm reading my code correctly, the bits should be the 0x10000 and 0x20000 bits of EBX for CPUID(EAX=7, ECX=0) and the 0xe0 bit of xcr0. |
I happy to test this on my machine with avx512. |
If I am reading this correctly, it would affect "only" VM setups where the virtual hardware claims to have the host CPU but does not provide all its features ? At least as far as I know, there are no physical processors around that identify as SkylakeX and do not support AVX512 (?) |
Nope. 735ca38 |
Tentative fix in #1949 - took me a while to realize that the implementation of cpuid was not zeroing ecx as required. |
@martin-frbg I have the hardware for this and would be happy to test, but am unclear on the process. Let me know if you think that'd help. |
Generally one can download a PR as a conventional diff file (to be applied with the |
I would still appreciate if one of you could check out the current |
Happy to do this next week. Do you have instructions for what I should best run? I guess another test would be to run Julia in docker against this PR but not sure how hard that is. |
As I understand it, you would probably need to build OpenBLAS on OSX (outside docker) and then replace the stock Julia libopenblas.so in the docker container with it. |
Unfortunately I'm running into build errors trying to build the develop branch at 21c0f2a. You can see the full build output as well as only the errors on https://gist.github.com/mk/b3ac286cbcbbaae81aeb811fb753224b Please let me know what else I could try. Does it have to be built on my system or could I also download a binary of openblas to test this? |
That build error ("register %xmm16 is only available with AVX512" despite already compiling with -march=skylake-avx512) looks like it could be LLVM bug 36202 here: http://lists.llvm.org/pipermail/llvm-bugs/2018-February/062338.html - if that is the case you could try adding -mcpu=skx in system.cmake or Makefile.x86_64. A binary built on a different system would probably work as well (but I know next to nothing about OSX). |
Can you tell me how to add that option? diff --git i/Makefile.x86_64 w/Makefile.x86_64
index 1b7fe3ef..c88c3057 100644
--- i/Makefile.x86_64
+++ w/Makefile.x86_64
@@ -1,4 +1,4 @@
-# CCOMMON_OPT += -DFASTCPU
+CCOMMON_OPT += -mcpu=skx
ifeq ($(OSNAME), SunOS)
ifdef BINARY64 I tried setting it with the diff above bug then I get:
|
Strange, I copied that suggestion straight from the llvm bug. Indeed the current clang documentation seems to know |
Still get the same error diff --git i/Makefile.x86_64 w/Makefile.x86_64
index 1b7fe3ef..166926f0 100644
--- i/Makefile.x86_64
+++ w/Makefile.x86_64
@@ -1,4 +1,4 @@
-# CCOMMON_OPT += -DFASTCPU
+CCOMMON_OPT += -mavx512vl
ifeq ($(OSNAME), SunOS)
ifdef BINARY64 |
Could be that it wants |
Same result. I'll try to build it on another OSX machine without Skylake and copy the binary to my system. The CPU feature detection should still work since it's a runtime feature, correct? Will have to continue on this tomorrow. |
Actually I had still time to give this a try and it seems to work:
Not sure about the warnings though. |
That last message looks as if Julia expects OpenBLAS to be built with the INTERFACE64=1 option (and "64" postfix added to all function symbols accordingly) so it actually failed to load the library. (And with "haswell" in the name it was not built for runtime detection a.k.a DYNAMIC_ARCH=1 anyway) |
I tried building it with the options Julia uses but I still couldn't get it to work. @yuyichao can you maybe help? Does Julia maybe have a canary build against the |
Unfortunately our CI build on OSX is falling over an "invalid % escape in inline assembly" citing the "%{1to8%} in the SkylakeX DGEMM microkernel ever since I updated its xcode environment, else it would perhaps be possible to make it put the generated library somewhere. (The update became necessary when Homebrew stopped providing gcc builds for the old OSX version, most likely the old xcode was not capable of avx512 at all). I can only build Linux libraries locally, which are probably useless on OSX. |
I'm not sure about all the options to build the openblas to the exact same config used by julia binary. OTOH, I feel like it'll be easier to test to use a C/fortran repro? |
(And I have no idea about the assembler error or how to make clang happy etc.................) |
I have "fixed" the clang problem by updating the Travis config to use an even more recent version of xcode but have not looked into if/how it is possible to store the output of the CI run. |
To test this, it is probably sufficient to build OpenBLAS with DYNAMIC_ARCH=1 on OSX (with xcode 9.4 or later according to the CI results), and copy the compiled openblas_utest to the docker container. |
FYI several julia users also ran into this bug, including myself. |
@musm can you check if this is actually fixed on the develop branch, as it should be ? |
@martin-frbg finally coming back to this, sorry for the long silence. I can't build openblas on OS X and run it on docker, that gives me ./openblas_utest
bash: ./openblas_utest: cannot execute binary file: Exec format error But that's expected, right? Compiling and running the utest util on docker, gives me:
|
Thanks, so hopefully fixed with the just released 0.3.6 |
Ref JuliaLang/julia#29652 (comment), see my version of the code in julia (which is mostly copied and updated from LLVM).
Disclaimer: I do not have the hardware to test this so I cannot tell if this is already fixed or not (by looking at the code I assume not, unless it's handled in a very different way than avx support). That's why I wished that the two people that have actually experienced this issue to report them instead. I'm reporting it now since I just noticed that none of them seem to have done that already. Feel free to close if I'm just missing the issue in my search somehow. I won't be able to help testing any patch but I'm happy to explain the code that does the testing in julia (I feel like it's fairly simple and very similar to the avx one though....).
The text was updated successfully, but these errors were encountered: