Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lapack/netlib: add Dpbtrf and Dpbtrs #63

Merged
merged 1 commit into from
Mar 17, 2020
Merged

lapack/netlib: add Dpbtrf and Dpbtrs #63

merged 1 commit into from
Mar 17, 2020

Conversation

vladimir-ch
Copy link
Member

The tests fail already for a 1x1 matrix with a positive element because LAPACKE says the matrix is not positive definite. I think it is because the column-major conversion routine LAPACKE_dgb_trans is completely wrong. You can take a look but it makes no sense to merge this until LAPACKE is fixed, so this PR is here only as a reminder.

@kortschak
Copy link
Member

It might be worth filing an issue with OpenBLAS, @martin-frbg is vastly more responsive than the NETLIB mob.

@martin-frbg
Copy link

Thanks for the pointer but I do not even see an issue filed with the "NETLIB mob" yet so please cut them some slack ... looks to me that project is reduced to a husband-and-wife team nowadays so I kind of feel their pain. (And at least I would probably need a c/fortran reproducer anyway to understand the issue, not sure about them)
Seeing that LAPACKE_dgb_trans has not changed in years, this might also be something else like the elusive C-to-FORTRAN ABI bug involving single-character arguments that got exposed by more enthusiastic optimizing in recent GCC (Reference-LAPACK/lapack#339). In that case compiling with -fno-optimize-sibling-calls would be expected to help.

@vladimir-ch
Copy link
Member Author

This is a (minimal) reproducer:

#include <stdio.h>
#include <lapacke.h>
int main() {
        lapack_int n = 1;
        lapack_int kd = 1;
        lapack_int ldab = kd+1;
        double a[4] = {2, -1, -1, -1};
        lapack_int info = LAPACKE_dpbtrf_work(LAPACK_ROW_MAJOR, 'U', 1, kd, a, ldab);
        printf("info = %d\n", info); // 1 with ROW_MAJOR, correct 0 with COL_MAJOR
        return 0;
}

Valgrind reports use of unitialized memory allocated by LAPACKE_dpbtrf_work.

But I think a reproducer is not really necessary, it doesn't demonstrate much anyway. For a 1x1 matrix the storage order doesn't matter, yet here row major gives an error. The only non-trivial operation that LAPACKE does for row major is the row->col->row conversion. And just from looking at LAPACKE_dgb_trans it's clear it doesn't do what it should (unless my/Gonum's idea of how row-major for band storage works is completely off). Already the use of ldin and ldout in the for loop bounds indicates something fishy.

For sure, I'll report it also to the netlib reference.

@kortschak
Copy link
Member

kortschak commented Jul 27, 2019

@martin-frbg "mob" as used in the Australian context (i.e. by me) is a non-derogatory word meaning a group or association of people related by family, work or interest ties - I forget that others don't necessarily know the meanings of our common words. The common on responsiveness is a reflection of past filed issues with NETLIB.Your point about the size of the team is well taken though.

@martin-frbg
Copy link

Agreed - I do not see how the lapacke_?gb_trans is expected to work (and it seems the code does not know either). What is a bit strange is that this has survived unchanged from the original 2010 LAPACKE code (except that matrix_order got renamed to matrix_layout at some point) without anybody noticing.

@kortschak
Copy link
Member

The persistence of incorrect transpose handling code in LAPACKE is not unprecedented. We have found at least one other case where the transpose was incorrectly handled leading to a segfault.

@vladimir-ch
Copy link
Member Author

vladimir-ch commented Jul 29, 2019

Can anyone please take a look at https://software.intel.com/en-us/node/520871#BAND_STORAGE and confidently explain to me how a band matrix is laid out in memory? I find the information there contradicting.

It says:

Row major layout: matrix A is packed in an m-by-ldab matrix AB row-wise so that the diagonals of A become columns of AB. ... The number of columns of AB ldab≥kl + ku + 1, and the number of rows of AB is m.

but later:

Converting the Packed Matrix to a One-Dimensional Array: Element ai,j is stored as array element a[k(i, j)] where the mapping of k(i, j) is defined as
row major layout: k(i, j) = (i - j)*ldab + kl + j - 1; 1 ≤i≤m, max(1, i - kl) ≤j≤ min(n, i + ku)

That means that according to the mapping definition the diagonals are stored contiguously in memory, that is, they become rows of AB, contradicting what is written above! An alternative view is that the matrix is packed row-wise and the resulting full matrix is stored column-wise but that does not make any sense at all and is not how BLAS stores band matrices (https://software.intel.com/en-us/mkl-developer-reference-c-matrix-storage-schemes-for-blas-routines):

Band storage: a band matrix is stored compactly in a two-dimensional array. For column-major layout, columns of the matrix are stored in the corresponding columns of the array, and diagonals of the matrix are stored in a specific row of the array. For row-major layout, rows of the matrix are stored in the corresponding rows of the array, and diagonals of the matrix are stored in a specific column of the array.

Apparently, LAPACKE_dgb_trans operates according to the mapping definition and fixing it would mean breaking it for those (few) who currently use LAPACKE with band matrices in row-major.

@vladimir-ch
Copy link
Member Author

The senseless mapping definition view is apparently accepted and used by others, see for example https://stackoverflow.com/questions/42750671/writing-a-banded-matrix-in-a-row-major-layout-for-lapack-solver-dgbsv and the accepted answer there (and the confusion about different storage order between CBLAS and LAPACKE expressed in comments by OP).

@vladimir-ch
Copy link
Member Author

I've been pondering this further and I have an unverifiable hypothesis. The Intel documentation with all its row-wise packing and diagonals-becoming-columns should be ignored except for the mapping k (surprisingly). I think that the packed matrix looks the same independent of the storage (diagonals in rows, columns in columns), and ROW_MAJOR and COL_MAJOR determine only how this packed matrix is stored. That would mean that ldab must be at least n with ROW_MAJOR, not kd+1 and this would actually match the check in LAPACKE_dpbtrf_work. It would also mean that LAPACKE_dgb_trans is basically correct in how it does the transpose (the for loop bounds are still wrong). However, it would also mean that Gonum's band matrices are unimplementable without doing the same transpose either for CBLAS or for LAPACKE because these two interfaces assume different band storage formats. This could happen even if CBLAS or LAPACKE are not involved, that is, with our own pure Go gonum implementation. Imagine a lapack routine that takes a band matrix and at some point passes it to blas. We will probably never have it but this is what happens in DGBRFS.

It's either this or something else. I wonder whether anyone actually knows for sure.

@martin-frbg
Copy link

The only other (related) documentation I found is from the ESSL at
https://www.ibm.com/support/knowledgecenter/en/SSFHY8_6.1/reference/am5gr_hsgbtrf.html#am5gr_hsgbtrf - for me it only added to the confusion. (That documentation also goes into some detail on the differences between what it calls "General Band Storage Mode" and "BLAS General Band Storage Mode"...)
The LAPACKE examples distributed with MKL cleverly omit anything involving banded matrices as far as I can tell.

@kortschak
Copy link
Member

@vladimir-ch We have formal tests for what we consider to be the correct band storage. These are in blas/blas64/conv.go. I implemented them from the docs in blas/gonum/doc.go and (from memory) this document.

@vladimir-ch
Copy link
Member Author

Yes, we use the BLAS storage. The fact that the blas/netlib tests pass hopefully confirms that. The problem is that lapacke most likely assumes a different storage. We can live with that but our lapack/netlib calls would have to do the BLAS->lapacke->BLAS conversion. And if we did that we could just as well convert directly to column major and so avoid the buggy row-major part of lapacke (at least for band matrices).

@vladimir-ch
Copy link
Member Author

I've created Reference-LAPACK/lapack#348

@vladimir-ch
Copy link
Member Author

I've spent some time today playing with doing various conversions and testing them with OpenBLAS and MKL. My conclusion based on the behavior of MKL and the LAPACKE source is that the band storage for LAPACKE is indeed the FORTRAN variant (diagonals in rows, columns in colums) only stored in row major layout. This means that it is not the CBLAS format that we use in Gonum. Anyone using LAPACKE and CBLAS from C for band matrices in row-major should be aware of this.

I'm not able to reproduce the valgrind warning about uninitialized memory. One possibility is that the binary was using OpenBLAS 0.3.6 from the system which was probably not compiled with the -fno-optimize-sibling-calls flag and that was causing trouble? Who knows. The implementation of LAPACKE_dgb_trans does not evoke trust but maybe by coincidence it does the right thing.

I consider this resolved, so now this PR has to wait until the lapacke code generation is updated to do the conversion directly from and to column major on our side.

@vladimir-ch
Copy link
Member Author

I've rebased this but need some time to recall the status or what it still needs.

@vladimir-ch
Copy link
Member Author

I'm afraid we will have to convert in lapack/netlib from the BLAS banded format (rows in rows, diagonals in columns, row-major storage) to the LAPACKE format (columns-in-columns, diagonals-in-rows, row-major storage). LAPACKE will then internally convert to the column-major storage for the Fortran code. It's horrible but I see no other reasonable way. What do you think, @kortschak ?

@kortschak
Copy link
Member

That seems OK.

@vladimir-ch vladimir-ch force-pushed the lapack/dpbtrf branch 2 times, most recently from c723ba7 to 0d69fdf Compare March 17, 2020 10:01
Also move the banded matrix conversion code to the lapack/netlib package
because that's where any matrix conversion should be done. The lapacke
package should accept exactly what LAPACKE accepts which means that
unfortunately for banded matrices there will be the inevitable overhead
of two conversions: one from BLAS (Gonum) row-major format to LAPACKE
row-major format for banded matrices and one inside LAPACKE to the
FORTRAN column-major banded format. In case of Dpbtrf the inverse
conversion must be performed for the factored matrix.

lapack/netlib: add Dpbtrs

conv
@vladimir-ch
Copy link
Member Author

PTAL

@codecov-io
Copy link

codecov-io commented Mar 17, 2020

Codecov Report

Merging #63 into master will decrease coverage by 0.01%.
The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
- Coverage   30.42%   30.40%   -0.02%     
==========================================
  Files           4        3       -1     
  Lines        6420     6456      +36     
==========================================
+ Hits         1953     1963      +10     
- Misses       4037     4060      +23     
- Partials      430      433       +3     
Impacted Files Coverage Δ
lapack/netlib/lapack.go 32.90% <40.90%> (+0.21%) ⬆️
lapack/netlib/conv.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d71f404...7938375. Read the comment docs.

@vladimir-ch vladimir-ch merged commit c5a04cf into master Mar 17, 2020
@vladimir-ch vladimir-ch deleted the lapack/dpbtrf branch March 17, 2020 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants