-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory confusion calling dlasd4 #2340
Comments
I think the problem is OpenBLAS. If I use the version from vecLib I get the correct answer. I also did a test in Fortran where OpenBLAS returns NaN. |
cc @xianyi |
odd that openblas would mess up the largest singular value and not the rest |
@andreasnoackjensen Could you post your Fortran code in a gist? It would probably help @xianyi narrow it down. |
I am confused. I have tried to test this on Ubuntu as well and maybe you should ask the LAPACK author about this. Please have a look at https://gist.github.com/andreasnoackjensen/4976070. Maybe the common error factor is a development version of LAPACK, but I don't know how to get the exact snapshot dates for the LAPACK code. |
Actually I checked with Ming Gu who told me that Ren Cang Li was the author. I am enclosing what he said was the correct code, and he said there was a bug fix at some point. It's possible that I'm one of the first customers of this code :-)
|
Your version from 3.4.0 is identical to the version in 3.4.2 (the latest and the one in Ubuntu 13.04) except for MAXIT. However, the version of dlasd4.f in 3.4.1 (which is the version in Ubuntu 12.10) is significantly different from the two others. The versions in 3.4.0 and 3.4.2 give the wrong answer when linked dynamically but not statically. The version in 3.4.1 is correct when it returns |
What do you make of all this? |
Hi @andreasnoackjensen , Do you mean you can get the correct result with the static library? Xianyi |
Hi @xianyi Yes exactly. I have updated the gist with an example where I link statically and dynamically to a fresh build of OpenBLAS-develop. |
This is rather bizarre! |
I have tried the code again on Ubuntu. Last time was on Mac. On Ubuntu I can also get the wrong result with static linking. It is just kind of random and can be triggered by different things. In particular I can change the result by commenting out the write statements as shown in the gist. Hence I guess the subroutine in LAPACK 3.4.0 and 3.4.2 has some kind of error, but not the subroutine in 3.4.1. |
@alanedelman Could you point the authors to this issue page? |
@alanedelman Correction. I assumed that the version you posted was from LAPACK 3.4.0 since that is what the header says. However, that is not the case. Your version is the same as dlasd4.f in LAPACK 3.4.2 except that in yours Update: Sorry for spamming. The old bug might be related the following which is from julia with LAPACK 3.4.1 and reference BLAS julia> v=sort(randn(10).^2);z=randn(10);
julia> svdmax(v,z)
ERROR: LapackException(-1099511627776)
in lasd4! at /home/andreasnoackjensen/Desktop/dlasd4.jl:31
in svdmax at /home/andreasnoackjensen/Desktop/dlasd4.jl:37
julia> svdmax(v,z)
4.571419253335635 |
This is now submitted as a LAPACK bug so I think this issue can be closed. |
We can leave it open until Julia updates to a fixed version. |
I have added an upstream tag to this issue. We should leave it open until upstream fixes it, and we use it. Could you also provide pointers to the communication with the LAPACK team for the record here? |
The bug is registered here http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=4250 |
@andreasnoackjensen Should we just replace the |
I think we should also find the matrices for which the svd is incorrect due to this bug and add it to our testsuite. Since we are going to release 0.2 at some point, we should certainly make sure that our svd is not broken. I am tagging this as 0.2. Also see discussion in OpenMathLib/OpenBLAS#225. |
I am not sure about this one. While I have experienced weird errors while calling |
@alanedelman Can you reach out to someone in the LAPACK team about |
This is clearly not a 0.2 issue. Updated issue to have no milestone. |
There is a patch for this now. See http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=13&t=4250. The development version of LAPACK also has a fix and both solutions seem to solve the problem on my machine. |
Perhaps we should apply this patch after untarring openblas. |
openblas no longer depends on LAPACK download, as it is included within.
@staticfloat We will need to apply this patch for your openblas 2.8 debs too. Waiting to verify that this problem is actually fixed with this patch, before closing. |
gesdd's RWORK size was recently changed to match the netlib header. However, the minimum size calculated results in a segfault. Both Octave and Numpy use a different minimum size, and testing verified that anything smaller than this leads to a segfault. Numpy: https://github.com/numpy/numpy/blob/master/numpy/linalg/umath_linalg.c.src#L2922 Octave: http://hg.savannah.gnu.org/hgweb/octave/file/2f1729cae08f/liboctave/numeric/CmplxSVD.cc#l184
Hi @ViralBShah , Could you create a pull request to OpenBLAS, too? Xianyi |
Will do. |
Can this be closed now that this is patched? |
@ViralBShah bump. I'm moving this to 0.3, since the remaining issue (having you make a pull request to OpenBLAS), doesn't seem to be necessary for 0.2 release |
Patch LAPACK XLASD4.f as discussed in JuliaLang/julia#2340
I've gone ahead and patched the openblas packages on my PPA with this, and On Wed, Aug 21, 2013 at 12:06 AM, Jameson Nash notifications@git.luolix.topwrote:
|
Openblas develop branch includes this patch now. |
I'm now progressing from #2267 writing my fourth lapack call,
and I need to compute the singular values of a broken arrow matrix efficiently
If d and z are vectors of the same size, I want
svd( [diagm(d) z]) efficiently
I'm plugging in the little known dlasd4! and I get all the singular values correctly
except the largest one, which just returns d[n]
Sorry, if I'm just not getting pointers right or if I need to copy data, or somehow
my workspace is too small, but can anyone help?
The problem is likely to be pilot error on my part -- my fault as all these pointers
mystify me. But I haven't ruled out julia or lapack yet.
If it is lapack (which I doubt) the author of this code is in town tomorrow
(Monday 2/18) and i can consult with him.
Bug example
The answer should be .858... but instead pulling the memory from v[end]
On occasion it does give the right answer.
The text was updated successfully, but these errors were encountered: