Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy issue in SVD API "SGESDD " #45

Open
srvasanth opened this issue Dec 7, 2020 · 4 comments
Open

Accuracy issue in SVD API "SGESDD " #45

srvasanth opened this issue Dec 7, 2020 · 4 comments

Comments

@srvasanth
Copy link

srvasanth commented Dec 7, 2020

Hi,
We are observing few failures in one of our customer applications using libFLAME and BLIS for SVD API “SGESDD”. The outputs of singular values S and the Orthogonal matrix U are differing from expected output. The tests pass with OpenBLAS and MKL libraries' outputs for the same API.

Input Matrix A Size : 9 x 100
Input values: {1} -> All 1s
Parameters:
JobZ : ‘O’
M : 9
N : 100
LDA : 9
LDU : M
LDVT : 1

Outputs from libflame+BLIS

Singular values (S)
3.0000e+01
4.6447e-06
5.2638e-13
1.2358e-19
0.0000e+00
-0.0000e+00
-0.0000e+00
-0.0000e+00
-0.0000e+00

Orthogonal Matrix(U)
3.3333e-01 -9.4281e-01 -0.0000e+00 -0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
3.3333e-01 1.1785e-01 -9.3541e-01 -1.2102e-07 7.0755e-15 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
3.3333e-01 1.1785e-01 1.3363e-01 -9.2582e-01 1.3064e-08 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 9.1287e-01 0.0000e+00 0.0000e+00 0.0000e+00 0.0000e+00
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 -1.8257e-01 4.4721e-01 4.4721e-01 4.4721e-01 4.4721e-01
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 -1.8257e-01 -8.6180e-01 1.3820e-01 1.3820e-01 1.3820e-01
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 -1.8257e-01 1.3820e-01 -8.6180e-01 1.3820e-01 1.3820e-01
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 -1.8257e-01 1.3820e-01 1.3820e-01 -8.6180e-01 1.3820e-01
3.3333e-01 1.1785e-01 1.3363e-01 1.5430e-01 -1.8257e-01 1.3820e-01 1.3820e-01 1.3820e-01 -8.6180e-01

Expected output:

Singular values(S)
3.0000e+01
4.4731e-06
2.8951e-12
3.2130e-18
2.3120e-24
1.2895e-30
1.3683e-36
1.6802e-42
2.9427e-44

Orthogonal Matrix(U)
-3.3333e-01 9.4281e-01 6.4572e-07 9.9341e-09 9.9341e-09 -1.9868e-08 -1.9868e-08 -1.9868e-08 0.0000e+00
-3.3333e-01 -1.1785e-01 9.3541e-01 -1.0867e-06 -7.6012e-09 -1.5052e-08 1.5202e-08 5.1177e-09 0.0000e+00
-3.3333e-01 -1.1785e-01 -1.3363e-01 9.2582e-01 7.9038e-07 -1.7868e-08 3.0130e-10 -2.3329e-09 0.0000e+00
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 9.1287e-01 -5.9798e-07 2.1286e-08 -1.2825e-08 0.0000e+00
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 -1.8257e-01 8.9443e-01 -1.0503e-06 -1.6157e-08 0.0000e+00
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 -1.8257e-01 -2.2361e-01 8.6603e-01 1.3324e-06 -2.2352e-08
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 -1.8257e-01 -2.2361e-01 -2.8868e-01 8.1635e-01 -1.5403e-02
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 -1.8257e-01 -2.2361e-01 -2.8867e-01 -4.2152e-01 -6.9928e-01
-3.3333e-01 -1.1785e-01 -1.3363e-01 -1.5430e-01 -1.8257e-01 -2.2361e-01 -2.8867e-01 -3.9484e-01 7.1468e-01

Any analysis or help regarding this will be highly appreciated.

@boegel
Copy link

boegel commented Mar 12, 2021

We saw a couple of failing numpy tests across a variety of CPUs (Haswell, Skylake, Zen2) when trying to build numpy 1.19.4 on top of latest BLIS (0.8.0) and libFLAME (5.2.0) and GCC 10.2, an example is below.

We're not seeing those failing tests when using OpenBLAS (0.3.12) with the LAPACK it ships, or with Intel MKL (2020 update 4).

If we use BLIS 0.8.0 + reference LAPACK 3.9.0, then there are no failing tests, so the culprit must be libFLAME...

_________________________________________________________________________________________________________________ TestRandomDist.test_multivariate_normal[svd] _________________________________________________________________________________________________________________

self = <numpy.random.tests.test_generator_mt19937.TestRandomDist object at 0x14eac29cce50>, method = 'svd'

    @pytest.mark.parametrize("method", ["svd", "eigh", "cholesky"])
    def test_multivariate_normal(self, method):
        random = Generator(MT19937(self.seed))
        mean = (.123456789, 10)
        cov = [[1, 0], [0, 1]]
        size = (3, 2)
        actual = random.multivariate_normal(mean, cov, size, method=method)
        desired = np.array([[[-1.747478062846581,  11.25613495182354  ],
                             [-0.9967333370066214, 10.342002097029821 ]],
                            [[ 0.7850019631242964, 11.181113712443013 ],
                             [ 0.8901349653255224,  8.873825399642492 ]],
                            [[ 0.7130260107430003,  9.551628690083056 ],
                             [ 0.7127098726541128, 11.991709234143173 ]]])

>       assert_array_almost_equal(actual, desired, decimal=15)
E       AssertionError:
E       Arrays are not almost equal to 15 decimals
E
E       Mismatched elements: 12 / 12 (100%)
E       Max absolute difference: 3.98341847
E       Max relative difference: 2.2477228
E        x: array([[[ 1.994391640846581,  8.74386504817646 ],
E               [ 1.243646915006621,  9.657997902970179]],
E       ...
E        y: array([[[-1.747478062846581, 11.25613495182354 ],
E               [-0.996733337006621, 10.342002097029821]],
E       ...

actual     = array([[[ 1.99439164,  8.74386505],
        [ 1.24364692,  9.6579979 ]],

       [[-0.53808839,  8.81888629],
        [-0.64322139, 11.1261746 ]],

       [[-0.46611243, 10.44837131],
        [-0.46579629,  8.00829077]]])
cov        = [[1, 0], [0, 1]]
desired    = array([[[-1.74747806, 11.25613495],
        [-0.99673334, 10.3420021 ]],

       [[ 0.78500196, 11.18111371],
        [ 0.89013497,  8.8738254 ]],

       [[ 0.71302601,  9.55162869],
        [ 0.71270987, 11.99170923]]])
mean       = (0.123456789, 10)
method     = 'svd'
random     = Generator(MT19937) at 0x14EAC29CE040
self       = <numpy.random.tests.test_generator_mt19937.TestRandomDist object at 0x14eac29cce50>
size       = (3, 2)

@fgvanzee
Copy link
Member

Unfortunately SVD in libflame is not easy to debug because it uses a completely different algorithm than the one in LAPACK. So for now, I'll encourage you both to use netlib LAPACK + BLIS as your workaround. Apologies for the inconvenience.

@boegel
Copy link

boegel commented Mar 14, 2021

@fgvanzee That's indeed the alternative approach we're going forward with, we've also seen some other problems with libFLAME (like #46).

@boegel
Copy link

boegel commented Mar 15, 2021

Perhaps related to this: AMD has just released a new version of their libFLAME fork, see https://github.com/amd/libflame/releases/tag/3.0, which mentions in the release notes "Several bug fixes including handling denormal numbers in SVD functions".

I'm not sure those fixes are related to this issue, but it seems like they could be...

I'll try and find time to take that new AMD-libFLAME version for a spin, and see if I'm still running into problems with the numpy test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants