Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable running of LAPACK tests for recent OpenBLAS easyconfigs + add patch to fix failing LAPACK tests due to use of -ftree-vectorize #16406

Merged

Conversation

boegel
Copy link
Member

@boegel boegel commented Oct 14, 2022

(created using eb --new-pr)

cfr. #16380

requires:

By disabling the use of -ftree-vectorize, failing LAPACK tests are dropping from:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1294329         1226    (0.095%)        0       (0.000%)
DOUBLE PRECISION        1302917         1197    (0.092%)        0       (0.000%)
COMPLEX                 756180          1208    (0.160%)        0       (0.000%)
COMPLEX16               761792          450     (0.059%)        0       (0.000%)

--> ALL PRECISIONS      4115218         4081    (0.099%)        0       (0.000%)

to

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1308369         1       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1316957         0       (0.000%)        0       (0.000%)
COMPLEX                 768036          38      (0.005%)        0       (0.000%)
COMPLEX16               768848          94      (0.012%)        0       (0.000%)

--> ALL PRECISIONS      4162210         133     (0.003%)        0       (0.000%)

(results collected on an AMD Rome system)

edit: thanks to patch from @bartoldeman (added in bb0269a) + patched GCC (see #16411), the number of failing LAPACK tests is now down to a handful...

…max. number of failing tests due to numerical errors to 150
@boegel boegel changed the title enable running of LAPACK tests for recent OpenBLAS easyconfigs + set max. number of failing tests due to numerical errors to 150 (cfr. https://github.com/easybuilders/easybuild-easyconfigs/issues/16380) requires: * https://github.com/easybuilders/easybuild-easyblocks/pull/2801 enable running of LAPACK tests for recent OpenBLAS easyconfigs + set max. number of failing tests due to numerical errors to 150 Oct 14, 2022
@boegel boegel force-pushed the 20221014113137_new_pr_OpenBLAS0317 branch from 0bff8b7 to 8694579 Compare October 14, 2022 09:34
@boegel boegel added this to the next release (4.6.2?) milestone Oct 14, 2022
@boegel
Copy link
Member Author

boegel commented Oct 14, 2022

@boegelbot please test @ generoso
EB_ARGS="--include-easyblocks-from-pr 2801 --installpath /tmp/$USER/pr16406"

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=16406 EB_ARGS="--include-easyblocks-from-pr 2801 --installpath /tmp/$USER/pr16406" /opt/software/slurm/bin/sbatch --job-name test_PR_16406 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9278

Test results coming soon (I hope)...

- notification for comment with ID 1278747855 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

boegelbot commented Oct 14, 2022

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2801
FAILED
Build succeeded for 0 out of 4 (4 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/91961365aaf0073676efa9353f93417d for a full test report.

edit: failed due to:

(cd ./lapack-netlib; ./lapack_testing.py -r -b TESTING)
/usr/bin/env: python: No such file or directory

@boegel
Copy link
Member Author

boegel commented Oct 14, 2022

@boegelbot please test @ generoso
EB_ARGS="--include-easyblocks-from-pr 2801 --installpath /tmp/$USER/pr16406"

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=16406 EB_ARGS="--include-easyblocks-from-pr 2801 --installpath /tmp/$USER/pr16406" /opt/software/slurm/bin/sbatch --job-name test_PR_16406 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9284

Test results coming soon (I hope)...

- notification for comment with ID 1279082958 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member Author

boegel commented Oct 14, 2022

LAPACK test results from Intel Skylake X:

  • with -ftree-vectorize enabled:
                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1294329         1206    (0.093%)        0       (0.000%)
DOUBLE PRECISION        1302917         1203    (0.092%)        0       (0.000%)
COMPLEX                 754416          1174    (0.156%)        0       (0.000%)
COMPLEX16               763556          436     (0.057%)        0       (0.000%)

--> ALL PRECISIONS      4115218         4019    (0.098%)        0       (0.000%)
  • with -ftree-vectorize disabled:
                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1308369         1       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1315193         2       (0.000%)        0       (0.000%)
COMPLEX                 776316          0       (0.000%)        0       (0.000%)
COMPLEX16               777128          0       (0.000%)        0       (0.000%)

--> ALL PRECISIONS      4177006         3       (0.000%)        0       (0.000%)

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2801
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
cns2 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/de93a0808c36bf8b7d7f13a2280dae57 for a full test report.

@boegel
Copy link
Member Author

boegel commented Oct 14, 2022

Test report by @boegel
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2801
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3145.skitty.os - Linux RHEL 8.4, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/7f2a2a9399f975c3e931fee36b360ac1 for a full test report.

@casparvl
Copy link
Contributor

Why not use a patch based on OpenMathLib/OpenBLAS#3786 instead? That will only disable the vectorization for the lapack part, meaning that the OpenBLAS code itself will still be vectorized.

Also, thinking about the future: are we going to backport the GCC patch to the GCCcore EasyConfigs? Because to be fair, Lapack might not be the only Fortran code that has been miscompiled due to this compiler bug. If we decide to backport the GCC patch, are we then changing the OpenBLAS EasyConfigs again to turn the vectorization back on (also for the Lapack part)?

Don't get me wrong: I'm all for making sure that at least a fix makes it into the next EB release, and probably turning off vectorization is the most quick and 'sure' way to do that for now. Just wondering how we'll treat this in the longer run, and just want to stress that if this happens to Lapack, it might have happened to other codes.

@bartoldeman
Copy link
Contributor

#16411
adds the patch to GCC.

@bartoldeman
Copy link
Contributor

I justed tested OpenBLAS-0.3.20-GCC-11.3.0.eb with vectorization enabled, with patched GCC 11.3 and it gives me this:

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1308369         1       (0.000%)        0       (0.000%)        
DOUBLE PRECISION        1316957         0       (0.000%)        0       (0.000%)        
COMPLEX                 758916          229     (0.030%)        0       (0.000%)        
COMPLEX16               765740          446     (0.058%)        0       (0.000%)        

--> ALL PRECISIONS      4149982         676     (0.016%)        0       (0.000%)       

(avx2 target, most are close eigenvalues). Will check what it gives with vect disabled as well.

@bartoldeman
Copy link
Contributor

I could bring it down to

                        -->   LAPACK TESTING SUMMARY  <--
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1308369         1       (0.000%)        0       (0.000%)        
DOUBLE PRECISION        1316957         0       (0.000%)        0       (0.000%)        
COMPLEX                 768540          1       (0.000%)        0       (0.000%)        
COMPLEX16               777128          0       (0.000%)        0       (0.000%)        

--> ALL PRECISIONS      4170994         2       (0.000%)        0       (0.000%)        

with -ftree-vectorize enabled, patching lapack as follows:

--- OpenBLAS-0.3.21/lapack-netlib/SRC/clahqr.f.orig     2022-08-07 20:36:26.000000000 +0000
+++ OpenBLAS-0.3.21/lapack-netlib/SRC/clahqr.f  2022-10-15 02:56:05.646906448 +0000
@@ -484,7 +484,7 @@
 *           in columns K to I2.
 *
             DO 80 J = K, I2
-               SUM = CONJG( T1 )*H( K, J ) + T2*H( K+1, J )
+               SUM = CONJG( T1 )*H( K, J ) + ( T2*H( K+1, J ) )
                H( K, J ) = H( K, J ) - SUM
                H( K+1, J ) = H( K+1, J ) - SUM*V2
    80       CONTINUE
@@ -493,7 +493,7 @@
 *           matrix in rows I1 to min(K+2,I).
 *
             DO 90 J = I1, MIN( K+2, I )
-               SUM = T1*H( J, K ) + T2*H( J, K+1 )
+               SUM = T1*H( J, K ) + ( T2*H( J, K+1 ) )
                H( J, K ) = H( J, K ) - SUM
                H( J, K+1 ) = H( J, K+1 ) - SUM*CONJG( V2 )
    90       CONTINUE
--- OpenBLAS-0.3.21/lapack-netlib/SRC/zlahqr.f.orig     2022-08-07 20:36:26.000000000 +0000
+++ OpenBLAS-0.3.21/lapack-netlib/SRC/zlahqr.f  2022-10-15 02:11:54.111966029 +0000
@@ -484,7 +484,7 @@
 *           in columns K to I2.
 *
             DO 80 J = K, I2
-               SUM = DCONJG( T1 )*H( K, J ) + T2*H( K+1, J )
+               SUM = DCONJG( T1 )*H( K, J ) + ( T2*H( K+1, J ) )
                H( K, J ) = H( K, J ) - SUM
                H( K+1, J ) = H( K+1, J ) - SUM*V2
    80       CONTINUE
@@ -493,7 +493,7 @@
 *           matrix in rows I1 to min(K+2,I).
 *
             DO 90 J = I1, MIN( K+2, I )
-               SUM = T1*H( J, K ) + T2*H( J, K+1 )
+               SUM = T1*H( J, K ) + ( T2*H( J, K+1 ) )
                H( J, K ) = H( J, K ) - SUM
                H( J, K+1 ) = H( J, K+1 ) - SUM*DCONJG( V2 )
    90       CONTINUE

what's going on here? In the above loop I1 and I2 have different values depending on whether eigenvectors are computed or not. Hence the vectorization acts differently and its use of FMA instructions as well. The parentheses force a certain order that is compatible with non-FMA instructions here, and eigenvalues end up with the exact same binary value in the tests.

@boegel: Let me know what you think, the hard work of figuring out why is done, now it's just admin and implementing patches in the right places etc.

@boegel
Copy link
Member Author

boegel commented Oct 15, 2022

@bartoldeman Should that last patch be contributed to OpenBLAS, and maybe also to https://github.com/Reference-LAPACK/lapack/?

Looks good though, we should apply this to these easyconfigs for sure, that seems like a better approach than disabling the use of -ftree-vectorize...

@bartoldeman
Copy link
Contributor

Yes it should be going to reference lapack in some shape or form (or making the tests loser). Mathematically it is of course super innocent but it also relies on brackets disabling FMA, I'm not at all sure that is guaranteed in the future.

Something for later this week. We can certainly have it in EB I think.

@boegel
Copy link
Member Author

boegel commented Oct 16, 2022

@martin-frbg Any thoughts on the LAPACK patch that @bartoldeman put together?

@martin-frbg
Copy link

Yes please create upstream PRs for this if it is not too much bother @bartoldeman (weird that it should need the extra parentheses, wonder if that is another, more subtle gfortran bug)

@easybuilders easybuilders deleted a comment from boegelbot Oct 16, 2022
@bartoldeman
Copy link
Contributor

@martin-frbg
I don't think there's a bug perse, just that gfortran with FMA has some freedom with the expression
a*b+c*d
which can be either
fma(a,b,mul(c,d))
or
fma(c,d,mul(a,b))
or of course without fma
add(mul(a,b),mul(c,d))
and it seems that the vectorized loop uses a different variation than the unvectorized one, and the number of iterations is different between computing or not computing eigenvectors.

now with parentheses you can force one of the above (e.g. the first one for a*b+(c*d), and the last one with (a*b)+(c*d) which can be checked with e.g. godbolt.org, but I'm not sure how standard that is). Also the above uses complex numbers which give rise to more complex expressions of course.

However: this change only applies to smaller matrices, with n <= 15. A change was made in the tests (not in OpenBLAS 0.3.20, but in OpenBLAS 0.3.21) so it also includes a test with n = 20. For those a different blocked algorithm is used that calls cgemm, and the optimized cgemm also uses FMA instructions.

can you actually double check your own OpenBLAS installation? I have failing tests here (even without any special optimizations):

$ EIG/xeigtstc < ced.in  |grep seed
 N=   20, IWK= 1, seed=3252,3952,3382,2953, type 11, test( 5)= 0.839E+07
 N=   20, IWK= 2, seed=3252,3952,3382,2953, type 11, test( 5)= 0.839E+07
 N=   20, IWK= 1, seed=3707,2862,2871,2329, type 12, test( 5)= 0.839E+07
 N=   20, IWK= 2, seed=3707,2862,2871,2329, type 12, test( 5)= 0.839E+07
 N=   20, IWK= 1, seed=1365,3950,1356,1993, type 19, test( 5)= 0.839E+07
 N=   20, IWK= 2, seed=1365,3950,1356,1993, type 19, test( 5)= 0.839E+07
 N=   20, IWK= 1, seed=1717,3723,1475,3401, type 20, test( 5)= 0.839E+07
 N=   20, IWK= 2, seed=1717,3723,1475,3401, type 20, test( 5)= 0.839E+07
 N=   20, IWK= 1, seed=3735,1543,2047, 713, type 21, test( 5)= 0.839E+07
 N=   20, IWK= 2, seed=3735,1543,2047, 713, type 21, test( 5)= 0.839E+07

do you get those too or do you have 0 failures?

I think in the end is it a false expectation of the tests to expect exactly the same eigenvalues with or without eigenvectors, and the tests need adjusting, not random tricks avoiding FMA?

@bartoldeman
Copy link
Contributor

Note I must stress that those failures look large but aren't in reality, it's just the test setting it to 1/ULP if the eigenvalues are not exactly the same:

for N= 20, IWK= 2, seed=3252,3952,3382,2953, type 11, test( 5)= 0.839E+07

 Eigenvalue  2 different:          (-0.215385541,0.111216456) vs.          (-0.215385392,0.111216530)
 Eigenvalue  3 different:          (-0.137095153,0.194372609) vs.          (-0.137095302,0.194372416)
 Eigenvalue  4 different:          (0.148889706,-0.192565531) vs.          (0.148889646,-0.192565277)
 Eigenvalue  5 different:         (-0.119180769,-0.203644738) vs.         (-0.119180419,-0.203644574)
 Eigenvalue  6 different:      (-0.230950281,4.016282503E-03) vs.      (-0.230950370,4.015875049E-03)
 Eigenvalue  7 different:     (-2.082111686E-02,-0.242263168) vs.     (-2.082153223E-02,-0.242263198)
 Eigenvalue  8 different:          (0.209737733,-0.104841843) vs.          (0.209737808,-0.104841903)
 Eigenvalue  9 different:         (-0.169165567,-0.134739131) vs.         (-0.169165969,-0.134739578)
 Eigenvalue 10 different:     (-0.200623572,-7.666894794E-02) vs.     (-0.200623393,-7.666870207E-02)
 Eigenvalue 11 different:      (6.524411589E-02,-0.221647575) vs.      (6.524408609E-02,-0.221647590)
 Eigenvalue 12 different:           (0.132817283,0.163589850) vs.           (0.132817283,0.163589880)

… max. number of failing LAPACK tests due to numerical errors to 10 + don't disable vectorize toolchain option
@boegel boegel changed the title enable running of LAPACK tests for recent OpenBLAS easyconfigs + set max. number of failing tests due to numerical errors to 150 enable running of LAPACK tests for recent OpenBLAS easyconfigs + add patch to fix failing LAPACK tests due to use of -ftree-vectorize Oct 17, 2022
@easybuilders easybuilders deleted a comment from boegelbot Oct 17, 2022
@bartoldeman
Copy link
Contributor

yes it's always the same eigenvalue test.
We have two options then:

  1. Apply a patch to make that test a little loser, that way we can have a limit of 10.
  2. Raise the limit to 100.
    I can work on that patch, isn't a lot of work.

…100 for recent OpenBLAS easyconfigs, due to eigenvalue test being a bit too strict
@boegel
Copy link
Member Author

boegel commented Oct 17, 2022

@boegelbot please test @ generoso

@boegel
Copy link
Member Author

boegel commented Oct 17, 2022

yes it's always the same eigenvalue test. We have two options then:

1. Apply a patch to make that test a little loser, that way we can have a limit of 10.

2. Raise the limit to 100.
   I can work on that patch, isn't a lot of work.

I want to get this PR merged soon now, so I'll go with option 2 for now (see c474ce7), we can follow up with a separatePR that adds an extra patch + lowers the max. failing test count back to 10

@bartoldeman
Copy link
Contributor

bartoldeman commented Oct 17, 2022

I don't see c474ce7 yet but ok with me :)
Will have the patch in an hour or so.

@boegel
Copy link
Member Author

boegel commented Oct 17, 2022

I don't see c474ce7 yet but ok with me :) Will have the patch in an hour or so.

Sorry, forgot to actually push it 🤦‍♂️

@easybuilders easybuilders deleted a comment from boegelbot Oct 17, 2022
@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=16406 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_16406 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9304

Test results coming soon (I hope)...

- notification for comment with ID 1281127449 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 0 out of 4 (4 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/d4db115156734fc91201d891d180eb03 for a full test report.

@boegel
Copy link
Member Author

boegel commented Oct 17, 2022

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
fair-mastodon-c6g-2xlarge-0001 - Linux Rocky Linux 8.5, AArch64, ARM UNKNOWN (graviton2), Python 3.6.8
See https://gist.github.com/a2123620813dcd3603aa4875f771db4b for a full test report.

@martin-frbg
Copy link

@bartoldeman right, the lapack testsuite is (unfortunately) very much centered on itself although Reference-LAPACK supports building with an alternate BLAS implementation (as already evidenced by the STFSM test - I see you noticed that issue ticket).
Plus there are/were a few inaccuracies in the testsuite itself that may "encourage" deviations, such as mixups between single and double precision (assumed fixed in their master branch, but I need to merge the relevant post-release PRs in OpenBLAS) And I had totally forgotten about that 1/ULP thing to signal "WRONG RESULT"

@boegel
Copy link
Member Author

boegel commented Oct 17, 2022

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3116.skitty.os - Linux RHEL 8.4, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/2648194f406ecc7d7064e2a3c336713b for a full test report.

@bartoldeman
Copy link
Contributor

@boegel seems your build on cns1 is still not good, maybe raise to 200, or the original 150 again?
The eigenvalue checks are still a little strange, I won't have that patch today in the end.

…it more to 150 for recent OpenBLAS easyconfigs
@boegel
Copy link
Member Author

boegel commented Oct 17, 2022

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=16406 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_16406 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9305

Test results coming soon (I hope)...

- notification for comment with ID 1281456729 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member Author

boegel commented Oct 17, 2022

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node3588.doduo.os - Linux RHEL 8.4, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/f476299dbda19de5036244455d396b8a for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
cns1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/f2b1a755c07c805be3edd4d8e3b64052 for a full test report.

Copy link
Contributor

@bartoldeman bartoldeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@bartoldeman
Copy link
Contributor

Going to merge this. I'll do some further investigations tomorrow but this looks good to go now.

@bartoldeman bartoldeman merged commit c41a988 into easybuilders:develop Oct 18, 2022
@boegel
Copy link
Member Author

boegel commented Oct 18, 2022

Test report by @boegel
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
easybuild2.novalocal - Linux CentOS Stream 8, POWER, IBM pSeries (emulated by qemu) (power9le), Python 3.6.8
See https://gist.github.com/19a8e9daf35babc25df4e412470793f0 for a full test report.

@boegel boegel deleted the 20221014113137_new_pr_OpenBLAS0317 branch October 18, 2022 06:41
@akesandgren
Copy link
Contributor

Note: on KNL I get 243 errors in the LAPACK tests of OpenBLAS

@bartoldeman
Copy link
Contributor

@akesandgren can you also attach lapack-netlib/TESTING/testing_results.txt from your build directory on KNL?
Just verifying if it's more of the same...

@akesandgren
Copy link
Contributor

@bartoldeman
Copy link
Contributor

Thanks @akesandgren. This seems to be caused by (probably FMAs in) asm code in the core OpenBLAS, as for TARGET=HASWELL I also get 134 failures even with toolchainopts = {'vectorize': False}

None of them are worrying though, it's all the same strict eigenvalue mismatches.

@bartoldeman
Copy link
Contributor

I submitted a patch OpenMathLib/OpenBLAS#3795 that improves the situation on Haswell (using FMA throughout, instead of only in the C code because our use of -march=xxx).

This should also improve the situation on KNL, though of course OpenBLAS isn't optimal there since it won't use AVX512 on KNL (you'll need to use MKL or BLIS there).

For Skylake, perhaps accidentally, OpenBLAS uses a generic C kernel for cscal and zscal, which is why you get almost no test failures there. I'll submit an OpenBLAS Skylake microkernel for those some time next week.

@martin-frbg
Copy link

oops yes the skx cscal/zscal looks accidental - change slipped in with an unrelated PR. thx. KNL is treated like haswell as in its prime time there was no competent avx512 dev/contributor around. the SKX kernels use avx512 extensions not present on KNL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants