Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New BLAS for some platforms have issues with upper-triangular inverses #254

Closed
mattfidler opened this issue Apr 22, 2019 · 52 comments
Closed

Comments

@mattfidler
Copy link

mattfidler commented Apr 22, 2019

I'm unsure if this is the correct place to post, or if I need to post in the armadillo package itself; I have had some problems with the newer RcppArmadillo BLAS routines on some platforms:

nlmixrdevelopment/RxODE#84

I tracked it down to:

inv(trimatu(cholMat))

The code seems to say that the matrix isn't upper triangular. On other platforms/ older versions it correctly identifies the upper triangular matrix and takes the inverse more quickly than if it was not identified. These tests used to pass in CRAN, but no longer do so...

For now, I have changed the code to:

success = inv(retMat, trimatu(cholMat))

and wrapped it in try

@eddelbuettel
Copy link
Member

I suggest you close this but open a fully reproducible example at the upstream repo on GitLab: https://gitlab.com/conradsnicta/armadillo-code/issues

I would be happy to check what you claim against the two or three BLAS backends. It may well be an 'espilon' precision issue. @conradsnicta will know more.

@mattfidler
Copy link
Author

Ok.

@eddelbuettel
Copy link
Member

Also: when you say "newer", which version(s) do you refer to? Before bugging @coatless can you test the 9.400-rc1 version I added on the weekend?

@mattfidler
Copy link
Author

Ok. I can test. Did you upload to this to CRAN or github?

I need to get the information about the version of RcppArmadillo they were using.

I personally haven't reproduced it yet, but it is clear that CRAN is having problems with this with on ubuntu, so under certain circumstances it is reproducible.

https://cran.r-project.org/web/checks/check_results_RxODE.html

I passed the travis checks on R development without my work-around. (I thought travis blas was in sync with CRAN, sigh). I sure I'm we are going to get a de-listing email from CRAN soon about this.

@mattfidler
Copy link
Author

I linked the wrong code. This is the correct code location;

http://www.netlib.org/lapack/lapack-3.1.1/html/dtrti2.f.html

A code of -1 states that the tests for an upper triangular matrix fails. It could be an epsilon issue, I suppose. A work-around would be setting the lower-triangular to exactly zero

@eddelbuettel
Copy link
Member

eddelbuettel commented Apr 22, 2019

Ok. I can test. Did you upload to this to CRAN or github?

Sorry, it is here: https://github.com/RcppCore/RcppArmadillo/tree/feature/arma-9.400.rc1
I should have been more explicit. I can make a package in a drat repo out of it if you prefer.

dtrti2 rings a bell but didn't the discussion on r-devel list just tie this into the gfortran upstream bug reports (that I am an innocent bystander in because I connected R Core to Debian's gcc/gfortran maintainer who then passed on to gfortran upstream). I could possibly confuse mail threads too...

@mattfidler
Copy link
Author

Checking the code more carefully, a "-1" means the call to DTRI2 arguments were not checked successfully. It could be a gfortran upstream bug too.

@bruc
Copy link

bruc commented Apr 22, 2019

I have a suspicion that this coming from the compilers. I'm using devtoolset-8 from Redhat, and this provides gcc version 8-2. I haven't upgraded any of my R packages between a successful execution of the tests in RxODE and the current failure, but devtoolset-8 was upgraded in that time interval. I'm now trying a test where I've downgraded devtoolset-8 back and I rebuilding R and RxODE. I will report the results when I'm finished.

@mattfidler
Copy link
Author

No. I don't need a DRAT package. Let me check on my system to see if it fails with the development rcpp.

@mattfidler
Copy link
Author

This runs correctly on my system with the new release RcppArmadillo. I'm leaning toward this being a compiler issue, not a RcppArmadillo issue. Thanks @eddelbuettel for your help.

library(RxODE);rxTest("omega-chol");sessionInfo();library(RcppArmadillo);packageVersion("RcppArmadillo")
#> ✔ | OK F W S | Context
#> |  0       | Omega Cholesky 1x1, sqrt|  1       | Omega Cholesky 1x1, sqrt|  2       | Omega Cholesky 1x1, sqrt|  3       | Omega Cholesky 1x1, sqrt|  4       | Omega Cholesky 1x1, sqrt|  5       | Omega Cholesky 1x1, sqrt|  6       | Omega Cholesky 1x1, sqrt|  7       | Omega Cholesky 1x1, sqrt|  8       | Omega Cholesky 1x1, sqrt|  9       | Omega Cholesky 1x1, sqrt|  9       | Omega Cholesky 1x1, sqrt [0.5 s]
#> |  0       | Omega Cholesky 2x2, sqrt|  1       | Omega Cholesky 2x2, sqrt|  2       | Omega Cholesky 2x2, sqrt|  3       | Omega Cholesky 2x2, sqrt|  4       | Omega Cholesky 2x2, sqrt|  5       | Omega Cholesky 2x2, sqrt|  6       | Omega Cholesky 2x2, sqrt|  7       | Omega Cholesky 2x2, sqrt|  8       | Omega Cholesky 2x2, sqrt|  9       | Omega Cholesky 2x2, sqrt|  9       | Omega Cholesky 2x2, sqrt
#> |  0       | Omega Cholesky 3x3, sqrt|  1       | Omega Cholesky 3x3, sqrt|  2       | Omega Cholesky 3x3, sqrt|  3       | Omega Cholesky 3x3, sqrt|  4       | Omega Cholesky 3x3, sqrt|  5       | Omega Cholesky 3x3, sqrt|  6       | Omega Cholesky 3x3, sqrt|  7       | Omega Cholesky 3x3, sqrt|  8       | Omega Cholesky 3x3, sqrt|  9       | Omega Cholesky 3x3, sqrt|  9       | Omega Cholesky 3x3, sqrt
#> |  0       | Omega Cholesky 4x4, sqrt|  1       | Omega Cholesky 4x4, sqrt|  2       | Omega Cholesky 4x4, sqrt|  3       | Omega Cholesky 4x4, sqrt|  4       | Omega Cholesky 4x4, sqrt|  5       | Omega Cholesky 4x4, sqrt|  6       | Omega Cholesky 4x4, sqrt|  7       | Omega Cholesky 4x4, sqrt|  8       | Omega Cholesky 4x4, sqrt|  9       | Omega Cholesky 4x4, sqrt|  9       | Omega Cholesky 4x4, sqrt
#> |  0       | Omega Cholesky 5x5, sqrt|  1       | Omega Cholesky 5x5, sqrt|  2       | Omega Cholesky 5x5, sqrt|  3       | Omega Cholesky 5x5, sqrt|  4       | Omega Cholesky 5x5, sqrt|  5       | Omega Cholesky 5x5, sqrt|  6       | Omega Cholesky 5x5, sqrt|  7       | Omega Cholesky 5x5, sqrt|  8       | Omega Cholesky 5x5, sqrt|  9       | Omega Cholesky 5x5, sqrt|  9       | Omega Cholesky 5x5, sqrt
#> |  0       | Omega Cholesky 6x6, sqrt|  1       | Omega Cholesky 6x6, sqrt|  2       | Omega Cholesky 6x6, sqrt|  3       | Omega Cholesky 6x6, sqrt|  4       | Omega Cholesky 6x6, sqrt|  5       | Omega Cholesky 6x6, sqrt|  6       | Omega Cholesky 6x6, sqrt|  7       | Omega Cholesky 6x6, sqrt|  8       | Omega Cholesky 6x6, sqrt|  9       | Omega Cholesky 6x6, sqrt|  9       | Omega Cholesky 6x6, sqrt
#> |  0       | Omega Cholesky 7x7, sqrt|  1       | Omega Cholesky 7x7, sqrt|  2       | Omega Cholesky 7x7, sqrt|  3       | Omega Cholesky 7x7, sqrt|  4       | Omega Cholesky 7x7, sqrt|  5       | Omega Cholesky 7x7, sqrt|  6       | Omega Cholesky 7x7, sqrt|  7       | Omega Cholesky 7x7, sqrt|  8       | Omega Cholesky 7x7, sqrt|  9       | Omega Cholesky 7x7, sqrt|  9       | Omega Cholesky 7x7, sqrt
#> |  0       | Omega Cholesky 8x8, sqrt|  1       | Omega Cholesky 8x8, sqrt|  2       | Omega Cholesky 8x8, sqrt|  3       | Omega Cholesky 8x8, sqrt|  4       | Omega Cholesky 8x8, sqrt|  5       | Omega Cholesky 8x8, sqrt|  6       | Omega Cholesky 8x8, sqrt|  7       | Omega Cholesky 8x8, sqrt|  8       | Omega Cholesky 8x8, sqrt|  9       | Omega Cholesky 8x8, sqrt|  9       | Omega Cholesky 8x8, sqrt
#> |  0       | Omega Cholesky 9x9, sqrt|  1       | Omega Cholesky 9x9, sqrt|  2       | Omega Cholesky 9x9, sqrt|  3       | Omega Cholesky 9x9, sqrt|  4       | Omega Cholesky 9x9, sqrt|  5       | Omega Cholesky 9x9, sqrt|  6       | Omega Cholesky 9x9, sqrt|  7       | Omega Cholesky 9x9, sqrt|  8       | Omega Cholesky 9x9, sqrt|  9       | Omega Cholesky 9x9, sqrt|  9       | Omega Cholesky 9x9, sqrt
#> |  0       | Omega Cholesky 10x10, sqrt|  1       | Omega Cholesky 10x10, sqrt|  2       | Omega Cholesky 10x10, sqrt|  3       | Omega Cholesky 10x10, sqrt|  4       | Omega Cholesky 10x10, sqrt|  5       | Omega Cholesky 10x10, sqrt|  6       | Omega Cholesky 10x10, sqrt|  7       | Omega Cholesky 10x10, sqrt|  8       | Omega Cholesky 10x10, sqrt|  9       | Omega Cholesky 10x10, sqrt|  9       | Omega Cholesky 10x10, sqrt
#> |  0       | Omega Cholesky 11x11, sqrt|  1       | Omega Cholesky 11x11, sqrt|  2       | Omega Cholesky 11x11, sqrt|  3       | Omega Cholesky 11x11, sqrt|  4       | Omega Cholesky 11x11, sqrt|  5       | Omega Cholesky 11x11, sqrt|  6       | Omega Cholesky 11x11, sqrt|  7       | Omega Cholesky 11x11, sqrt|  8       | Omega Cholesky 11x11, sqrt|  9       | Omega Cholesky 11x11, sqrt|  9       | Omega Cholesky 11x11, sqrt
#> |  0       | Omega Cholesky 12x12, sqrt|  1       | Omega Cholesky 12x12, sqrt|  2       | Omega Cholesky 12x12, sqrt|  3       | Omega Cholesky 12x12, sqrt|  4       | Omega Cholesky 12x12, sqrt|  5       | Omega Cholesky 12x12, sqrt|  6       | Omega Cholesky 12x12, sqrt|  7       | Omega Cholesky 12x12, sqrt|  8       | Omega Cholesky 12x12, sqrt|  9       | Omega Cholesky 12x12, sqrt|  9       | Omega Cholesky 12x12, sqrt
#> 
#> ══ Results ════════════════════════════════════════════════════════════════
#> Duration: 1.0 s
#> 
#> OK:       108
#> Failed:   0
#> Warnings: 0
#> Skipped:  0
#> ✔ | OK F W S | Context
#> |  0       | Omega Cholesky 1x1, log|  1       | Omega Cholesky 1x1, log|  2       | Omega Cholesky 1x1, log|  3       | Omega Cholesky 1x1, log|  4       | Omega Cholesky 1x1, log|  5       | Omega Cholesky 1x1, log|  6       | Omega Cholesky 1x1, log|  7       | Omega Cholesky 1x1, log|  7       | Omega Cholesky 1x1, log [0.2 s]
#> |  0       | Omega Cholesky 1x1, identity|  1       | Omega Cholesky 1x1, identity|  2       | Omega Cholesky 1x1, identity|  3       | Omega Cholesky 1x1, identity|  4       | Omega Cholesky 1x1, identity|  5       | Omega Cholesky 1x1, identity|  6       | Omega Cholesky 1x1, identity|  7       | Omega Cholesky 1x1, identity|  8       | Omega Cholesky 1x1, identity|  9       | Omega Cholesky 1x1, identity|  9       | Omega Cholesky 1x1, identity [0.2 s]
#> |  0       | Omega Cholesky 2x2, log|  1       | Omega Cholesky 2x2, log|  2       | Omega Cholesky 2x2, log|  3       | Omega Cholesky 2x2, log|  4       | Omega Cholesky 2x2, log|  5       | Omega Cholesky 2x2, log|  6       | Omega Cholesky 2x2, log|  7       | Omega Cholesky 2x2, log|  7       | Omega Cholesky 2x2, log [0.1 s]
#> |  0       | Omega Cholesky 2x2, identity|  1       | Omega Cholesky 2x2, identity|  2       | Omega Cholesky 2x2, identity|  3       | Omega Cholesky 2x2, identity|  4       | Omega Cholesky 2x2, identity|  5       | Omega Cholesky 2x2, identity|  6       | Omega Cholesky 2x2, identity|  7       | Omega Cholesky 2x2, identity|  8       | Omega Cholesky 2x2, identity|  9       | Omega Cholesky 2x2, identity|  9       | Omega Cholesky 2x2, identity [0.2 s]
#> |  0       | Omega Cholesky 3x3, log|  1       | Omega Cholesky 3x3, log|  2       | Omega Cholesky 3x3, log|  3       | Omega Cholesky 3x3, log|  4       | Omega Cholesky 3x3, log|  5       | Omega Cholesky 3x3, log|  6       | Omega Cholesky 3x3, log|  7       | Omega Cholesky 3x3, log|  7       | Omega Cholesky 3x3, log [0.2 s]
#> |  0       | Omega Cholesky 3x3, identity|  1       | Omega Cholesky 3x3, identity|  2       | Omega Cholesky 3x3, identity|  3       | Omega Cholesky 3x3, identity|  4       | Omega Cholesky 3x3, identity|  5       | Omega Cholesky 3x3, identity|  6       | Omega Cholesky 3x3, identity|  7       | Omega Cholesky 3x3, identity|  8       | Omega Cholesky 3x3, identity|  9       | Omega Cholesky 3x3, identity|  9       | Omega Cholesky 3x3, identity [0.2 s]
#> |  0       | Omega Cholesky 4x4, log|  1       | Omega Cholesky 4x4, log|  2       | Omega Cholesky 4x4, log|  3       | Omega Cholesky 4x4, log|  4       | Omega Cholesky 4x4, log|  5       | Omega Cholesky 4x4, log|  6       | Omega Cholesky 4x4, log|  7       | Omega Cholesky 4x4, log|  7       | Omega Cholesky 4x4, log [0.2 s]
#> |  0       | Omega Cholesky 4x4, identity|  1       | Omega Cholesky 4x4, identity|  2       | Omega Cholesky 4x4, identity|  3       | Omega Cholesky 4x4, identity|  4       | Omega Cholesky 4x4, identity|  5       | Omega Cholesky 4x4, identity|  6       | Omega Cholesky 4x4, identity|  7       | Omega Cholesky 4x4, identity|  8       | Omega Cholesky 4x4, identity|  9       | Omega Cholesky 4x4, identity|  9       | Omega Cholesky 4x4, identity [0.2 s]
#> 
#> ══ Results ════════════════════════════════════════════════════════════════
#> Duration: 1.4 s
#> 
#> OK:       64
#> Failed:   0
#> Warnings: 0
#> Skipped:  0
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux
#> 
#> Matrix products: default
#> BLAS: /usr/lib64/blas/reference/libblas.so.3.7.0
#> LAPACK: /usr/lib64/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] testthat_2.0.1 RxODE_0.9.0-5 
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.1       compiler_3.5.1   pillar_1.3.1     plyr_1.8.4      
#>  [5] highr_0.7        PreciseSums_0.3  tools_3.5.1      mvnfast_0.2.5   
#>  [9] digest_0.6.18    lattice_0.20-38  evaluate_0.12    memoise_1.1.0   
#> [13] tibble_2.1.1     gtable_0.3.0     pkgconfig_2.0.2  rlang_0.3.4     
#> [17] Matrix_1.2-15    cli_1.1.0        yaml_2.2.0       xfun_0.4        
#> [21] withr_2.1.2      stringr_1.4.0    dplyr_0.8.0.1    knitr_1.21      
#> [25] grid_3.5.1       tidyselect_0.2.5 inline_0.3.15    glue_1.3.1      
#> [29] R6_2.4.0         rmarkdown_1.11   polyclip_1.10-0  ggplot2_3.1.1   
#> [33] purrr_0.3.2      tweenr_1.0.1     farver_1.1.0     magrittr_1.5    
#> [37] units_0.6-2      scales_1.0.0     htmltools_0.3.6  MASS_7.3-50     
#> [41] assertthat_0.2.1 ggforce_0.2.1    colorspace_1.4-1 stringi_1.4.3   
#> [45] lazyeval_0.2.2   munsell_0.5.0    crayon_1.3.4
#> [1] '0.9.399.1.0'

Created on 2019-04-22 by the reprex package (v0.2.1)

@mattfidler
Copy link
Author

I am running gcc/gfortram 7.4.1-r6

@eddelbuettel
Copy link
Member

The relevant change is much further down the road. You'd need gfortran from source and properly bissected. Mathias puts snapshots of gcc et al into Debian's experimental repo every now and then but I am unsure if the change is in it.

@bruc
Copy link

bruc commented Apr 22, 2019

It appears to be the compiler -- I just rebuilt R and RxODE using gfortran 8.2.1 20180905 and there's no error with RxODE. I still have to upgrade back to 8.3.1 and repeat the test, but since Debian devel is probably running the most recent Gnu compiler, this seems a likely explanation.

@mattfidler
Copy link
Author

@eddelbuettel Do you know if there is a docker that re-creates the CRAN machines?

@eddelbuettel
Copy link
Member

eddelbuettel commented Apr 22, 2019

No, and that is one my single biggest pet peeves. I will try to bring that up with (some of) CRAN next time I meet (some of) them at useR!.

@mattfidler
Copy link
Author

It is a big frustration for me too. Thanks for bringing it up with them. I'm sure I'm not the only one who would like this.

I would especially like a solaris docker, but I don't even know if that is doable.

@bruc
Copy link

bruc commented Apr 22, 2019

I have rebuilt R and RxODE with the GCC compiler suite (8.3.1 20190311) and it fails with the original error in DTRTI2.

Is someone in the R community going to relay this error to the GNU compiler team? If not, it may take me some time to put together a reproducible test.

@eddelbuettel
Copy link
Member

eddelbuettel commented Apr 22, 2019

Is someone in the R community going to relay this error to the GNU compiler team?

Yes. They are on it. Now the gfortran maintainer would be helped with a reproducible example. Can you possibly distill something not depending on R for him?

@bruc
Copy link

bruc commented Apr 22, 2019

No, I don't know the code well enough to do that. I'm sorry.

@mattfidler
Copy link
Author

Can it be based on armadillo? Or does it need to be completely based on Fortran/blas?

@eddelbuettel
Copy link
Member

Hm. I guess with Armadillo is better than with R, and a report is better than none.... Armadillo still requires linking but I would think it would help.

@mattfidler
Copy link
Author

Where do I report the bug; Is there a R gfortran maintainer, or simply to gcc.

@eddelbuettel
Copy link
Member

I could loop you into an email thread, or you could email Thomas Koenig directly -- he is the fellow behind gfortran within the gcc team A(AIK). Let me loop you in with him.

@mattfidler
Copy link
Author

Thanks.

@mattfidler
Copy link
Author

mattfidler commented Apr 24, 2019

Hi Dirk, I think I created a reprex, but apparently I don't have enough skills to figure out how to link and test this issue. Here is what I have:

#include <iostream>
#include <armadillo>

using namespace arma;

int main(int argc, const char **argv){
  arma::Mat<double> A = arma::randu(4,4);
  A = A*A.t();
  A = chol(A);
  A = inv(trimatu(A)); // This is the line that gives the error
}

Unfortunately a simple link to RcppArmadillo doesn't seem to work for me:

matt@matt ~/src/issues/gfortran $ gcc test.cpp -o test -I"/home/matt/R/x86_64-pc-linux-gnu-library/3.5/RcppArmadillo/include"
/usr/lib/gcc/x86_64-pc-linux-gnu/7.4.1/../../../../x86_64-pc-linux-gnu/bin/ld: /tmp/ccj9FR7A.o: warning: relocation against `_ZTISt9bad_alloc' in read-only section `.text'
/usr/lib/gcc/x86_64-pc-linux-gnu/7.4.1/../../../../x86_64-pc-linux-gnu/bin/ld: /tmp/ccj9FR7A.o: in function `arma::arma_incompat_size_string(unsigned long long, unsigned long long, unsigned long long, unsigned long long, char const*)':
test.cpp:(.text+0x55): undefined reference to `std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream(std::_Ios_Openmode)'
/usr/lib/gcc/x86_64-pc-linux-gnu/7.4.1/../../../../x86_64-pc-linux-gnu/bin/ld: test.cpp:(.text+0x6e): undefined referenced
...

I tried to find linking guides under armadillo website, but it doesn't seem to be there for me. I think I am missing some sort of configue and makefile which has always been generated by R for me.

Most of my googling came up with is how to use generated configures. The closest I came up with was by you:

https://stackoverflow.com/questions/46723854/rcpparmadillo-using-autoconf-to-disable-openmp

When looking at your autoconfigure it doesn't seem to check for armadillo at all

https://github.com/RcppCore/RcppArmadillo/blob/master/configure.ac

I'm a bit stumped;

@eddelbuettel
Copy link
Member

eddelbuettel commented Apr 24, 2019

@mattfidler Really appreciate the continued support in chasing this.

"Upstream" Armadillo has "always" needed linking. New news here. What OS flavour are you? On Debian-derived Linuxen it just works (though the distro's armadillo package may be slightly out of sync with my distro r-cran-rcpparmadillo). So I quickly mod'ed yours, and it runs fine for me:

Code

// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; compile-command: "g++ -s -Wall -O3 -o arma_issue254 arma_issue254.cpp -larmadillo; ./arma_issue254" -*-

#include <iostream>
#include <armadillo>

using namespace arma;

int main(int argc, const char **argv){
    arma::Mat<double> A = arma::randu(4,4);
    A = A*A.t();
    A = chol(A);
    A = inv(trimatu(A)); // This is the line that gives the error
    A.print("A at end");
}

Output

-*- mode: compilation; default-directory: "~/src/progs/C++/" -*-
Compilation started at Wed Apr 24 10:59:42

g++ -s -Wall -O3 -o arma_issue254 arma_issue254.cpp -larmadillo; ./arma_issue254
A at end
   1.0482  -0.7919  -1.9054  -0.8589
        0   1.5650  -2.3196  -0.2896
        0        0   3.5934  -2.8233
        0        0        0   2.6929

Compilation finished at Wed Apr 24 10:59:45

I just carried the Emacs header line over from another nine year old (!!) example I had in the same directory and adjusted the filename. All we do ... is link with -larmadillo which for me lives in a sustem location.

@mattfidler
Copy link
Author

I forgot to install armadillo on my system. Once I installed armadillo linking to lapack/blas everything worked correctly for me too.

@eddelbuettel
Copy link
Member

Ah yes that part I often forget -- something you need to add -llapack -lblas; on mine it seems to autoresolve.

So: no minimally verifiable complete example then? Or only with a newer gfortran ?

@mattfidler
Copy link
Author

The linking was in armadillo once linked there, everything auto-resolved.

No minimally verifiable example yet. I'm going to install cent OS to see if I can get it there.

If I cannot by this weekend, @bruc said he would try with the newer gcc8 on redhat.

@mattfidler
Copy link
Author

Unfortunately the gcc8 is too old on CentOS; I had the expected results. I will wait for confirmation later.

nlmixrdevelopment/RxODE#84 (comment)

@mattfidler
Copy link
Author

Nevermind the armadillo and offending lapack is compiled by gcc 4.8.5 based on the transaction; I will need to compile these myself...

Running transaction
  Installing : libgfortran-4.8.5-36.el7_6.1.x86_64                         1/30 
...
  Verifying  : gcc-gfortran-4.8.5-36.el7_6.1.x86_64                        2/30 
...
  Verifying  : blas-3.4.2-8.el7.x86_64                                    12/30 
...
  Verifying  : lapack-3.4.2-8.el7.x86_64                                  14/30 
  Verifying  : libgfortran-4.8.5-36.el7_6.1.x86_64                        15/30 
 ...

@mattfidler
Copy link
Author

I cannot even get the lapack-release to compile with CentOS gcc8, so I'm stumped here. Hopefully @bruc can reproduce.

@eddelbuettel
Copy link
Member

Yes :-/ Working at the core of the toolchain is not trivial. I wish we had more regularly produced Docker containers or alike.

@mattfidler
Copy link
Author

I agree. Apparently I was working with the wrong gcc anyhow.

@mattfidler
Copy link
Author

Hi @eddelbuettel ,

Thanks for the follow up. @bruc tried the basic code and it worked just fine with the latest gcc/gfortran. Whats odd is the only place where the reported error should come up is inverting upper/lower triangular matrices, and that is the only place where I used it.

Since I cannot reproduce it, I suppose I cannot file it, though I know under certain (unknown) circumstances it occurs.

Anyhow thanks.

@eddelbuettel
Copy link
Member

Thanks for all the follow-up. You'd think that all the tires of linear algebra routines have been kicked from all directions but you never know. We'll keep an eye on it in case it rears its ugly head again.

@mattfidler
Copy link
Author

Thanks for your awesome packages. I will keep my eye open.

@bruc
Copy link

bruc commented May 13, 2019

Redhat (Paulo Andrade) has done a lot of work to track down the bug, and they filed this bugzilla report:

https://bugzilla.redhat.com/show_bug.cgi?id=1709538

@mattfidler: I think you'd be interested also -- this derives from the original issue I raised with RxODE.

@eddelbuettel
Copy link
Member

Apologies -- I forgot about this issue. But there has been a new upstream and as yesterday it is on CRAN so pretty please try 0.9.400.3.0.

@eddelbuettel
Copy link
Member

Of course this could still be a different issue. Did anybody at Redhat talk to @conradsnicta ?

@bruc
Copy link

bruc commented May 13, 2019

I don't know.

Would you please forward him the Bugzilla link?

@mattfidler
Copy link
Author

Thank you @bruc; This is helpful for anyone who uses these combination of software.

@mattfidler
Copy link
Author

mattfidler commented May 13, 2019

@eddelbuettel I believe he is using Armadillo directly, not through the Rcpp interface.

Is there an armadillo bugzilla equivalent that we should file to?

@eddelbuettel
Copy link
Member

eddelbuettel commented May 13, 2019

using Armadillo directly

Does not matter. But RcppArmadillo 0.9.400.3.0 and Armadillo 9.400.3 have the same Armadillo -- hence the symmetric version numbers. 9.400.3 has fixes over 9.400.2

armadillo bugzilla equivalent

Sure at this GitLab https://gitlab.com/conradsnicta/armadillo-code/issues repo.

@mattfidler
Copy link
Author

@eddelbuettel
Copy link
Member

Thanks. And just to double check: your issue is not addressed by 9.400.3 aka 0.9,400.3.0 ?

@mattfidler
Copy link
Author

I am not sure; I cannot reproduce, it took me too long to setup the toolchain myself and I don't subscribe to redhat.

My guess it is a compiler bug. The CRAN systems no longer show the error, though. Not clear to me if they changed the compiler or it was fixed by the revision of armadillo.

@eddelbuettel
Copy link
Member

eddelbuettel commented May 14, 2019

It seems Fortran (or gfortran?) somehow uses hidden variables to indicate the size of char arguments, which is news to me.

YES! I have been tangentially involved in some discussions started by R Core whom I could help with contacts to Debian gcc/gftortran maintainer and then upstream gfortran so I remained CCed on a few mails. A lot of ink spilled, not sure it is all in one place but there is a corresponding gfortran bugzilla issue.

On the weekend we just updated Debian's R package to add this call for gfortran >= 8 (as it has a something backported from gfortran) (and with apologies for making you read configure.ac ;-) )

## If the Fortran compiler is gfortran >= 7, add
## -fno-optimize-sibling-calls to avoid recent gfortran optimizations
## that break with LAPACK/BLAS-style passing of length-1 strings
## (without hidden arguments giving their lengths).
AC_CACHE_VAL([r_cv_have_gfortran7],
[r_cv_have_gfortran7=no
if test "x${GFC}" = xyes; then
  gfortran_version_maj=`echo "__GNUC__" | ${FC} -E -P - | ${SED} 's/ //g'`
  if test ${gfortran_version_maj} -ge 7; then
    r_cv_have_gfortran7=yes
  fi
fi])

if test "x${r_cv_have_gfortran7}" = xyes; then
  R_XTRA_FFLAGS="-fno-optimize-sibling-calls"
fi

In essence it comes up from C calling Fortran (which can be adjusted) and Fortran then doing, as I recall, some optimisation calling other Fortran (LAPACK say) routines in which the char setting drops. The flag -fno-optimize-sibling-calls prohibits the issue from affecting the secondary and later calls.

If you want I can probably dig up up the bugzilla ticket.

@martin-frbg
Copy link

martin-frbg commented May 14, 2019

OpenBLAS is equally affected - and Reference-LAPACK/lapack#339 tells that the netlib LAPACKE suffers from the same problem. Everybody seems to have been relying on essentially undefined behaviour until some change in recent gfortran finally made pigs fly. (Now one might be able to argue that the behaviour has actually been implementation-defined by majority if not unanimous vote, and it is gcc that is breaking the truce here) BTW the bugzilla link is in the ticket I linked here.
(EDIT: for the most part, OpenBLAS just bundles unmodified LAPACK from netlib - a handful of functions is replaced with own C code, and several more if one compiles with Peise's ReLAPACK.
The need for hidden size arguments has been known, but nobody expected it to extend to single-character constants which are the typical case in LAPACK.)

@eddelbuettel
Copy link
Member

There is simply too much software out there

I was fortunate to be on a few email exchanges and that point had been made too. We can't possibly fix all LAPACK/BLAS entry points and hope to retain plug-in compatibility.

I wish more of that emailed discussion was in a public place. I am not aware of one -- maybe the equivalent content got added to the RH or gcc bug trackers. Have not had time to chase it.

The truly important aspects are that a) this is local to new-ish gfortran 9 version and partial backports and some distro gfortran 8 builds, Debian included and b) is avoided by the -fno-optimize-sibling-calls flag.

@eddelbuettel
Copy link
Member

eddelbuettel commented May 16, 2019

Opinions on the shape of the world differ. This is the approach chosen by gfortran upstream and R Core. Good enough for me, at least for now. We may need to recompile LAPACK/BLAS with the same switch though to emulate the old behaviour more completely.

@jabl
Copy link

jabl commented May 16, 2019

Earlier today a patch was committed to GCC trunk (https://gcc.gnu.org/viewcvs?rev=271285&root=gcc&view=rev ), which disables sibling and tail call optimizations for Fortran procedures which take character arguments where the length has been declared constant in the procedure. It should be committed to the other branches soon as well.

So this is essentially the -fno-optimize-sibling-calls, except only for the particular case that has caused problems with LAPACK and not for all functions.

The long-term fix would still be to fix CBLAS and LAPACKE, get people to use those, and then in some years(?) remove the workaround.

(I don't think compiling BLAS/LAPACK with BIND(C) is a solution. Yes, it would fix these incorrect C prototypes, but then it would break Fortran code calling BLAS/LAPACK, just in the opposite way, and the ABI would be incompatible with "normal" BLAS/LAPACK)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants