-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: update OpenBLAS to 0.3.19 #20660
Conversation
Not sure what the usual timeline on updating this is. Running the CI on this PR is one way to narrow down the Cygwin problems addressed in PR#20654. If this build also has four failures, then OpenBLAS is probably the problem. If this build uses the new OpenBLAS with no problems, then I need to see what else changed in the Cygwin packages around the time the builds started failing.
Apparently the pre-built OpenBLAS downloads aren't in the expected place yet, or they changed their naming scheme. |
close/reopen to trigger CI now that 0.3.19 was built |
@martin-frbg it seems something changed in 0.3.19 or the way we are building it that causes it to fail NumPy tests. This was first noticed in the cygwin builds, and is now visible in the default build. Ideas? |
unfortunately I have no idea what could have gone wrong in the conda build. 0.3.19 is not supposed to bring major differences for x86_64 in normal operation, and my builds passed the LAPACK testsuite. |
It's not the conda build, it's the standalone build of OpenBLAS from https://github.com/MacPython/openblas-libs that is used in our CI and PyPI wheels.
To be concrete, there's 5 test failures on the "smoke test" CI job (TravisCI failures are unrelated) for Pytest output is pretty awful/unreadable, but here it is:
|
On my machine 0.3.19 passes. Testing shows
where on the failing smoke test I see
So maybe the difference is AVX512 intrinsics, the failing machine has |
There is one elusive (gcc 7.5 miscompilation ?) issue with AVX512, (probably) related to the 16x2 DGEMM kernel, but the relevant changes go back to 0.3.16. What could be different is that - assuming your MacPython build is gmake with DYNAMIC_ARCH - 0.3.19 will use the earlier 4x8 DGEMM kernel from 0.3.8 as a workaround. Unfortunately I cannot reproduce the problem on my SkylakeX, I only saw it as occasional CI failures with Intel SDE in an ubuntu-based container. |
@mattip I tried to test the way you suggested, but doesn't quite work (I only have conda installs lying around), so I'm not going to be all that helpful today: Downloaded My
Build log output:
Not all that surprising I guess, the |
Are you sure you want the i686 version? You are using an x86_64 compiler, so maybe you want openblas-v0.3.19-manylinux2014_x86_64.tar.gz or openblas-v0.3.19-manylinux1_x86_64.tar.gz instead |
I suggested PyPI change its table layout with clearly labeled
to prevent this kind of confusion, maybe anaconda.org should use one as well. |
No I wasn't, just being dumb. That said, the
Pretty clear what's going on there .... this is why we bundle a
Etc. - still no luck. This is going to be too much trouble, I'll wait on conda-forge/openblas-feedstock#130. |
OK, sorry for the mess. |
close/reopen |
Should this be closed? The cygwin test is now passing in main. @mattip Do you recall what changed? |
The problematic tests are skipped on cygwin after PR #20654. OpenBLAS 0.3.19 is still failing to work on AVX512_SKX, but I don't have access to a machine with that architecture to debug it. |
AVX512_SKX failure is probably fixed by OpenMathLib/OpenBLAS#3510 and 3511 , unfortunately there is another open issue involving AVX512 that I cannot reproduce on my (small) machine and the original reporter has gone quiet |
Trying with those fixes, maybe we will have some luck |
From the comment at the top of the PR
The build with OpenBLAS 0.3.19 had failures. After merging the post-0.3.19 fixes from OpenBLAS tests pass. I think that since we are at the beginning of a development cycle we should use this 0.3.19-dev version of OpenBLAS so it gets some exposure. Any thoughts? |
Sounds good to me. |
Thanks @DWesl |
This was removed a few months ago to avoid failures with a previous version (numpy#20654, numpy#20660), necessitating the use of the Netlib reference BLAS instead (numpy#20669). The Cygwin OpenBLAS implementation has updated to 0.3.20 since then (https://cygwin.com/cgi-bin2/package-grep.cgi?grep=openblas&arch=x86_64), so using OpenBLAS again should be fine.
This was removed a few months ago to avoid failures with a previous version (numpy#20654, numpy#20660), necessitating the use of the Netlib reference BLAS instead (numpy#20669). The Cygwin OpenBLAS implementation has updated to 0.3.20 since then (https://cygwin.com/cgi-bin2/package-grep.cgi?grep=openblas&arch=x86_64), so using OpenBLAS again should be fine.
Not sure what the usual timeline on updating this is.
Running the CI on this PR is one way to narrow down the Cygwin problems addressed in PR #20654.
If this build also has four failures, then OpenBLAS is probably the problem.
If this build uses the new OpenBLAS with no problems, then I need to see what else changed in the Cygwin packages around the time the builds started failing.