-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
elevated lapack test failures on Arm Neoverse v1 #4335
Comments
This is a known problem with the testsuite, which is increasingly worthless for anything other than testing the reference implementation of LAPACK when built against the Reference BLAS - which, to be fair, is/was its original purpose long before cpus became capable of FMA and compilers learned to perform aggressive optimization. See for example Reference-LAPACK/lapack#732 , also the earlier easybuild issue easybuilders/easybuild-easyconfigs#16380 |
tree-vectorize emits wildly bloated code and often overflows and underflows. |
@martin-frbg @brada4 |
It miscompiles, like over/under reading/writing the fixed size arrays BLAS has everywhere. |
The tree-vectorizer pass is already disabled for all of LAPACK (and any tests that use Fortran code), on the C (BLAS) side this has been handled on an individual file basis where miscompilations were found (I do not think it is helpful to claim that it miscompiles "everywhere" in general). However I was under the impression easybuild already disabled it in general, or is this only on x86_64 ? It should be noted that this option became a default only in recent versions of gcc as its developers assumed it was now stable enough. |
If you paste eg. daxpy into godbolt.org you see no-tree.... generates more or less logical code, like vector loads muls adds, if you enable tree vectorizer, it generates 10 screens of jump-laden code, around 2nd screen it does lea arg % 8 for example. |
It's good to know tree vectorize generate weird code. Strange enough, I only notice tree-vectroize cause numerical errors for test #5 for various flavors of GGEV and GGEV3. They are unlikely the only functions that require exact match of output results. If we have individual control of compiler flag for fortran source code similar to c functions, we can selectively disable tree-vectorize for just a few subroutines. |
I agree that some of its choices are a bit counter-intuitive, but I do not remember seeing anything actually wrong except in the (few) files that earned a |
Unfortunately I am not aware of there being any gfortran directive matching the gcc "optimize" pragma (the only remotely similar would be |
There are concerns that some performance may be lost due to |
Do I read that correctly that you are still at 0.3.21 from a year ago ? I just did a test run on Graviton3 with the current |
Yes. The ask (https://github.com/EESSI/software-layer/blob/77c88452405cc92ddd7bdc1a37af5f8aeb972c2f/eessi-2023.06-known-issues.yml#L16) from EasyBuild is to figure out why there are elevated fails with 0.3.21 with GCC 12. For 0.3.21, tree-vectorize is on. In the latest version, tree-vectorize is turned off. |
Because it is selectively disabled now, for old versions of OpenBLAS you need to disable tree-vectorize if using fresh GCC. |
tree-vectorize was actually turned off in response to an easybuild issue, but a lot has changed since 0.3.21 (including some of the LAPACK tests) |
Easybuild maintainer reported elevated lapack-test numerical errors on Arm Neoverse-V1 (344 in total for GCC-12).
https://github.com/EESSI/software-layer/blob/77c88452405cc92ddd7bdc1a37af5f8aeb972c2f/eessi-2023.06-known-issues.yml#L16
The number of numerical errors can reduced to 14 by turn off tree-vectorize with flag
-fno-tree-vectorize
. Since there are also concerns that this might affect BLAS/LAPACK library performance, we did a closer look at the actual numerical differences to confirm the correctness of the tests when compiling with the flag on.Initial result -
234 out of 344 tests are due to small difference between two runs - one to solve the full matrix (line 738) and the other one to solve partial (upper or lower portion) of the matrix (line 770). The test program expects the results to be exact match (line 780 of https://netlib.org/lapack/explore-html/d5/d99/group__single__eig_gae07927c6321c12cd0d92450eaa21ea9c.html) or the return code is set to be the inverse of ulp (a large number) to indicate a mismatch.
The functions under tests are SGGEV/DGGEV/CGGEV/ZGGEV and SGGEV3/DGGEV3/CGGEV3/ZGGEV3. I have validated the differences are indeed very small for these tests (see below for one of the example); however they aren't exact matching because the total amount of work is not the same.
96 out of remaining 110 tests are due to a problem with test driver clavhe.f. I found by accident that by adding a statement to print out N on line 408 https://netlib.org/lapack/explore-html/d3/d9a/group__complex__lin_ga25d4e26307cae0c5c897051ce64e2e91.html, these 96 errors can be fixed. This confirms that there is no issue with the library itself and there may be something wrong with the test driver.
The text was updated successfully, but these errors were encountered: