lapack/netlib: add Dpbtrf and Dpbtrs #63

vladimir-ch · 2019-07-27T12:42:29Z

The tests fail already for a 1x1 matrix with a positive element because LAPACKE says the matrix is not positive definite. I think it is because the column-major conversion routine LAPACKE_dgb_trans is completely wrong. You can take a look but it makes no sense to merge this until LAPACKE is fixed, so this PR is here only as a reminder.

kortschak · 2019-07-27T13:20:09Z

It might be worth filing an issue with OpenBLAS, @martin-frbg is vastly more responsive than the NETLIB mob.

martin-frbg · 2019-07-27T13:56:24Z

Thanks for the pointer but I do not even see an issue filed with the "NETLIB mob" yet so please cut them some slack ... looks to me that project is reduced to a husband-and-wife team nowadays so I kind of feel their pain. (And at least I would probably need a c/fortran reproducer anyway to understand the issue, not sure about them)
Seeing that LAPACKE_dgb_trans has not changed in years, this might also be something else like the elusive C-to-FORTRAN ABI bug involving single-character arguments that got exposed by more enthusiastic optimizing in recent GCC (Reference-LAPACK/lapack#339). In that case compiling with -fno-optimize-sibling-calls would be expected to help.

vladimir-ch · 2019-07-27T21:21:01Z

This is a (minimal) reproducer:

#include <stdio.h>
#include <lapacke.h>
int main() {
        lapack_int n = 1;
        lapack_int kd = 1;
        lapack_int ldab = kd+1;
        double a[4] = {2, -1, -1, -1};
        lapack_int info = LAPACKE_dpbtrf_work(LAPACK_ROW_MAJOR, 'U', 1, kd, a, ldab);
        printf("info = %d\n", info); // 1 with ROW_MAJOR, correct 0 with COL_MAJOR
        return 0;
}

Valgrind reports use of unitialized memory allocated by LAPACKE_dpbtrf_work.

But I think a reproducer is not really necessary, it doesn't demonstrate much anyway. For a 1x1 matrix the storage order doesn't matter, yet here row major gives an error. The only non-trivial operation that LAPACKE does for row major is the row->col->row conversion. And just from looking at LAPACKE_dgb_trans it's clear it doesn't do what it should (unless my/Gonum's idea of how row-major for band storage works is completely off). Already the use of ldin and ldout in the for loop bounds indicates something fishy.

For sure, I'll report it also to the netlib reference.

kortschak · 2019-07-27T21:48:48Z

@martin-frbg "mob" as used in the Australian context (i.e. by me) is a non-derogatory word meaning a group or association of people related by family, work or interest ties - I forget that others don't necessarily know the meanings of our common words. The common on responsiveness is a reflection of past filed issues with NETLIB.Your point about the size of the team is well taken though.

martin-frbg · 2019-07-28T16:46:18Z

Agreed - I do not see how the lapacke_?gb_trans is expected to work (and it seems the code does not know either). What is a bit strange is that this has survived unchanged from the original 2010 LAPACKE code (except that matrix_order got renamed to matrix_layout at some point) without anybody noticing.

kortschak · 2019-07-29T00:10:26Z

The persistence of incorrect transpose handling code in LAPACKE is not unprecedented. We have found at least one other case where the transpose was incorrectly handled leading to a segfault.

vladimir-ch · 2019-07-29T13:20:32Z

Can anyone please take a look at https://software.intel.com/en-us/node/520871#BAND_STORAGE and confidently explain to me how a band matrix is laid out in memory? I find the information there contradicting.

It says:

Row major layout: matrix A is packed in an m-by-ldab matrix AB row-wise so that the diagonals of A become columns of AB. ... The number of columns of AB ldab≥kl + ku + 1, and the number of rows of AB is m.

but later:

Converting the Packed Matrix to a One-Dimensional Array: Element ai,j is stored as array element a[k(i, j)] where the mapping of k(i, j) is defined as
row major layout: k(i, j) = (i - j)*ldab + kl + j - 1; 1 ≤i≤m, max(1, i - kl) ≤j≤ min(n, i + ku)

That means that according to the mapping definition the diagonals are stored contiguously in memory, that is, they become rows of AB, contradicting what is written above! An alternative view is that the matrix is packed row-wise and the resulting full matrix is stored column-wise but that does not make any sense at all and is not how BLAS stores band matrices (https://software.intel.com/en-us/mkl-developer-reference-c-matrix-storage-schemes-for-blas-routines):

Band storage: a band matrix is stored compactly in a two-dimensional array. For column-major layout, columns of the matrix are stored in the corresponding columns of the array, and diagonals of the matrix are stored in a specific row of the array. For row-major layout, rows of the matrix are stored in the corresponding rows of the array, and diagonals of the matrix are stored in a specific column of the array.

Apparently, LAPACKE_dgb_trans operates according to the mapping definition and fixing it would mean breaking it for those (few) who currently use LAPACKE with band matrices in row-major.

vladimir-ch · 2019-07-29T13:25:17Z

The senseless mapping definition view is apparently accepted and used by others, see for example https://stackoverflow.com/questions/42750671/writing-a-banded-matrix-in-a-row-major-layout-for-lapack-solver-dgbsv and the accepted answer there (and the confusion about different storage order between CBLAS and LAPACKE expressed in comments by OP).

vladimir-ch · 2019-07-29T20:00:56Z

I've been pondering this further and I have an unverifiable hypothesis. The Intel documentation with all its row-wise packing and diagonals-becoming-columns should be ignored except for the mapping k (surprisingly). I think that the packed matrix looks the same independent of the storage (diagonals in rows, columns in columns), and ROW_MAJOR and COL_MAJOR determine only how this packed matrix is stored. That would mean that ldab must be at least n with ROW_MAJOR, not kd+1 and this would actually match the check in LAPACKE_dpbtrf_work. It would also mean that LAPACKE_dgb_trans is basically correct in how it does the transpose (the for loop bounds are still wrong). However, it would also mean that Gonum's band matrices are unimplementable without doing the same transpose either for CBLAS or for LAPACKE because these two interfaces assume different band storage formats. This could happen even if CBLAS or LAPACKE are not involved, that is, with our own pure Go gonum implementation. Imagine a lapack routine that takes a band matrix and at some point passes it to blas. We will probably never have it but this is what happens in DGBRFS.

It's either this or something else. I wonder whether anyone actually knows for sure.

martin-frbg · 2019-07-29T20:29:50Z

The only other (related) documentation I found is from the ESSL at
https://www.ibm.com/support/knowledgecenter/en/SSFHY8_6.1/reference/am5gr_hsgbtrf.html#am5gr_hsgbtrf - for me it only added to the confusion. (That documentation also goes into some detail on the differences between what it calls "General Band Storage Mode" and "BLAS General Band Storage Mode"...)
The LAPACKE examples distributed with MKL cleverly omit anything involving banded matrices as far as I can tell.

kortschak · 2019-07-30T00:55:28Z

@vladimir-ch We have formal tests for what we consider to be the correct band storage. These are in blas/blas64/conv.go. I implemented them from the docs in blas/gonum/doc.go and (from memory) this document.

vladimir-ch · 2019-07-30T07:01:31Z

Yes, we use the BLAS storage. The fact that the blas/netlib tests pass hopefully confirms that. The problem is that lapacke most likely assumes a different storage. We can live with that but our lapack/netlib calls would have to do the BLAS->lapacke->BLAS conversion. And if we did that we could just as well convert directly to column major and so avoid the buggy row-major part of lapacke (at least for band matrices).

vladimir-ch · 2019-07-30T07:49:42Z

I've created Reference-LAPACK/lapack#348

vladimir-ch · 2019-08-18T23:08:33Z

I've spent some time today playing with doing various conversions and testing them with OpenBLAS and MKL. My conclusion based on the behavior of MKL and the LAPACKE source is that the band storage for LAPACKE is indeed the FORTRAN variant (diagonals in rows, columns in colums) only stored in row major layout. This means that it is not the CBLAS format that we use in Gonum. Anyone using LAPACKE and CBLAS from C for band matrices in row-major should be aware of this.

I'm not able to reproduce the valgrind warning about uninitialized memory. One possibility is that the binary was using OpenBLAS 0.3.6 from the system which was probably not compiled with the -fno-optimize-sibling-calls flag and that was causing trouble? Who knows. The implementation of LAPACKE_dgb_trans does not evoke trust but maybe by coincidence it does the right thing.

I consider this resolved, so now this PR has to wait until the lapacke code generation is updated to do the conversion directly from and to column major on our side.

vladimir-ch · 2019-10-31T08:25:45Z

I've rebased this but need some time to recall the status or what it still needs.

vladimir-ch · 2020-03-10T17:36:11Z

I'm afraid we will have to convert in lapack/netlib from the BLAS banded format (rows in rows, diagonals in columns, row-major storage) to the LAPACKE format (columns-in-columns, diagonals-in-rows, row-major storage). LAPACKE will then internally convert to the column-major storage for the Fortran code. It's horrible but I see no other reasonable way. What do you think, @kortschak ?

kortschak · 2020-03-10T20:26:48Z

That seems OK.

Also move the banded matrix conversion code to the lapack/netlib package because that's where any matrix conversion should be done. The lapacke package should accept exactly what LAPACKE accepts which means that unfortunately for banded matrices there will be the inevitable overhead of two conversions: one from BLAS (Gonum) row-major format to LAPACKE row-major format for banded matrices and one inside LAPACKE to the FORTRAN column-major banded format. In case of Dpbtrf the inverse conversion must be performed for the factored matrix. lapack/netlib: add Dpbtrs conv

vladimir-ch · 2020-03-17T10:03:44Z

PTAL

codecov-io · 2020-03-17T10:14:20Z

Codecov Report

Merging #63 into master will decrease coverage by 0.01%.
The diff coverage is 50.00%.

@@            Coverage Diff             @@
##           master      #63      +/-   ##
==========================================
- Coverage   30.42%   30.40%   -0.02%     
==========================================
  Files           4        3       -1     
  Lines        6420     6456      +36     
==========================================
+ Hits         1953     1963      +10     
- Misses       4037     4060      +23     
- Partials      430      433       +3

Impacted Files	Coverage Δ
lapack/netlib/lapack.go	`32.90% <40.90%> (+0.21%)`	⬆️
lapack/netlib/conv.go	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d71f404...7938375. Read the comment docs.

vladimir-ch mentioned this pull request Jul 30, 2019

LAPACKE: unexpected non-zero return from LAPACKE_dpbtrf_work Reference-LAPACK/lapack#348

Closed

vladimir-ch mentioned this pull request Aug 18, 2019

lapacke/internal: add conv package with DpbToColMajor and DpbToRowMajor #64

Merged

vladimir-ch force-pushed the lapack/dpbtrf branch from 1d6a57c to bed3ead Compare October 31, 2019 08:24

vladimir-ch force-pushed the lapack/dpbtrf branch from bed3ead to 90de761 Compare March 10, 2020 09:08

vladimir-ch force-pushed the lapack/dpbtrf branch 2 times, most recently from c723ba7 to 0d69fdf Compare March 17, 2020 10:01

vladimir-ch force-pushed the lapack/dpbtrf branch from 0d69fdf to 7938375 Compare March 17, 2020 10:02

kortschak approved these changes Mar 17, 2020

View reviewed changes

vladimir-ch merged commit c5a04cf into master Mar 17, 2020

vladimir-ch deleted the lapack/dpbtrf branch March 17, 2020 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lapack/netlib: add Dpbtrf and Dpbtrs #63

lapack/netlib: add Dpbtrf and Dpbtrs #63

vladimir-ch commented Jul 27, 2019

kortschak commented Jul 27, 2019

martin-frbg commented Jul 27, 2019

vladimir-ch commented Jul 27, 2019

kortschak commented Jul 27, 2019 •

edited

Loading

martin-frbg commented Jul 28, 2019

kortschak commented Jul 29, 2019

vladimir-ch commented Jul 29, 2019 •

edited

Loading

vladimir-ch commented Jul 29, 2019

vladimir-ch commented Jul 29, 2019

martin-frbg commented Jul 29, 2019

kortschak commented Jul 30, 2019

vladimir-ch commented Jul 30, 2019

vladimir-ch commented Jul 30, 2019

vladimir-ch commented Aug 18, 2019

vladimir-ch commented Oct 31, 2019

vladimir-ch commented Mar 10, 2020

kortschak commented Mar 10, 2020

vladimir-ch commented Mar 17, 2020

codecov-io commented Mar 17, 2020 •

edited

Loading

lapack/netlib: add Dpbtrf and Dpbtrs #63

lapack/netlib: add Dpbtrf and Dpbtrs #63

Conversation

vladimir-ch commented Jul 27, 2019

kortschak commented Jul 27, 2019

martin-frbg commented Jul 27, 2019

vladimir-ch commented Jul 27, 2019

kortschak commented Jul 27, 2019 • edited Loading

martin-frbg commented Jul 28, 2019

kortschak commented Jul 29, 2019

vladimir-ch commented Jul 29, 2019 • edited Loading

vladimir-ch commented Jul 29, 2019

vladimir-ch commented Jul 29, 2019

martin-frbg commented Jul 29, 2019

kortschak commented Jul 30, 2019

vladimir-ch commented Jul 30, 2019

vladimir-ch commented Jul 30, 2019

vladimir-ch commented Aug 18, 2019

vladimir-ch commented Oct 31, 2019

vladimir-ch commented Mar 10, 2020

kortschak commented Mar 10, 2020

vladimir-ch commented Mar 17, 2020

codecov-io commented Mar 17, 2020 • edited Loading

Codecov Report

kortschak commented Jul 27, 2019 •

edited

Loading

vladimir-ch commented Jul 29, 2019 •

edited

Loading

codecov-io commented Mar 17, 2020 •

edited

Loading