Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argument mismatch for call to gemm from ssids #4

Closed
nimgould opened this issue Feb 28, 2017 · 7 comments · Fixed by #69
Closed

argument mismatch for call to gemm from ssids #4

nimgould opened this issue Feb 28, 2017 · 7 comments · Fixed by #69
Assignees
Milestone

Comments

@nimgould
Copy link
Contributor

When running an application compiled with gfortran 4.9, I obtained the following segfault -

At line 630 of file blas.f90
Fortran runtime error: Actual string length is shorter than the declared one for dummy argument 'transa' (0/1)

Error termination. Backtrace:
#0 0x2ac124d25f07 in ???
#1 0x2ac124d26a45 in ???
#2 0x2ac124d26dfa in ???
#3 0x81f05c in dgemm_
at blas.f90:630
#4 0x6fc646 in _ZN5spral5ssids3cpu9host_gemmIdEEvNS1_9operationES3_iiiT_PKS4_iS6_iS4_PS4_i
at wrappers.cxx:27
#5 0x6fbef4 in _ZN5spral5ssids3cpu15ldlt_tpp_factorEiiPiPdiS3_S3_ibddiS3_i
at ldlt_tpp.cxx:219
#6 0x6f5db9 in ZN5spral5ssids3cpu17ldlt_app_internal5BlockIdLi32ENS1_14BuddyAllocatorIiSaIdEEEE6factorINS4_IdS5_EEEEiiPiPdRKNS1_18cpu_factor_optionsERSt6vectorINS1_9WorkspaceESaISG_EERKT
at ldlt_app.cxx:990
#7 0x6f6334 in _ZN5spral5ssids3cpu17ldlt_app_internal4LDLTIdLi32ENS2_10CopyBackupIdNS1_14BuddyAllocatorIdSaIdEEEEELb0ELb0ES7_E24run_elim_pivoted_notasksEiiPiPdiSB_RNS2_10ColumnDataIdNS5_IiS6_EEEERS8_RKNS1_18cpu_factor_optionsEidSB_iRSt6vectorINS1_9WorkspaceESaISL_EERKS7_i
at ldlt_app.cxx:1570
#8 0x6f4b55 in ZN5spral5ssids3cpu17ldlt_app_internal4LDLTIdLi32ENS2_10CopyBackupIdNS1_14BuddyAllocatorIdSaIdEEEEELb0ELb0ES7_E6factorEiiPiPdiSB_RS8_RKNS1_18cpu_factor_optionsENS1_11PivotMethodEidSB_iRSt6vectorINS1_9WorkspaceESaISI_EERKS7
at ldlt_app.cxx:2297
#9 0x6f5af2 in ZN5spral5ssids3cpu17ldlt_app_internal5BlockIdLi32ENS1_14BuddyAllocatorIiSaIdEEEE6factorINS4_IdS5_EEEEiiPiPdRKNS1_18cpu_factor_optionsERSt6vectorINS1_9WorkspaceESaISG_EERKT
at ldlt_app.cxx:974
#10 0x6f820e in _ZN5spral5ssids3cpu17ldlt_app_internal4LDLTIdLi32ENS2_10CopyBackupIdNS1_14BuddyAllocatorIdSaIdEEEEELb1ELb0ES7_E24run_elim_pivoted_notasksEiiPiPdiSB_RNS2_10ColumnDataIdNS5_IiS6_EEEERS8_RKNS1_18cpu_factor_optionsEidSB_iRSt6vectorINS1_9WorkspaceESaISL_EERKS7_i
at ldlt_app.cxx:1570
#11 0x6fa3ac in ZN5spral5ssids3cpu17ldlt_app_internal4LDLTIdLi32ENS2_10CopyBackupIdNS1_14BuddyAllocatorIdSaIdEEEEELb1ELb0ES7_E6factorEiiPiPdiSB_RS8_RKNS1_18cpu_factor_optionsENS1_11PivotMethodEidSB_iRSt6vectorINS1_9WorkspaceESaISI_EERKS7
at ldlt_app.cxx:2297
#12 0x6fb260 in ZN5spral5ssids3cpu15ldlt_app_factorIdNS1_14BuddyAllocatorIdSaIdEEEEEiiiPiPT_iS8_S7_S8_iRKNS1_18cpu_factor_optionsERSt6vectorINS1_9WorkspaceESaISD_EERKT0
at ldlt_app.cxx:2453
#13 0x6e475b in ???
#14 0x6e299a in factor_node<false, double, spral::ssids::cpu::BuddyAllocator<double, std::allocator > >
at ./ssids/cpu/factor.hxx:175
#15 0x6e299a in _ZN5spral5ssids3cpu14NumericSubtreeILb0EdLm8388608ENS1_11AppendAllocIdEEEC2ERKNS1_15SymbolicSubtreeEPKdSA_PPvRKNS1_18cpu_factor_optionsERNS1_11ThreadStatsE._omp_fn.6
at ssids/cpu/NumericSubtree.hxx:172
#16 0x2ac12534760e in ???
#17 0x2ac1253501db in ???
#18 0x6d01a8 in __spral_ssids_fkeep_MOD_inner_factor_cpu._omp_fn.1
at fkeep.f90:110

line 630-633 of the local file blas.f90 is

  SUBROUTINE DGEMM ( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB,
 $                   BETA, C, LDC )
  • .. Scalar Arguments ..
    CHARACTER*1  TRANSA, TRANSB
    

so maybe there is some confusion between a C and fortram character? The problem seems to vanish with a system blas call via -lblas.

Any ideas?

Nick

@flipflapflop
Copy link
Collaborator

Nick,

Yes it seems like a C/FORTRAN interfacing error. I guess you were using the reference BLAS in this case, which version did you use? Also, are you using the master or release version of SSIDS?

Florent

@nimgould
Copy link
Contributor Author

Hi Florent

I was using the blas from NETLIB from a year or two ago. My version of SSIDS is dated October 19th 2016.

Nick

@mjacobse
Copy link
Collaborator

mjacobse commented May 27, 2021

I noticed something probably related when compiling both SSIDS and HSL_MA97 into the same binary with link-time optimization. There are multiple -Wlto-type-mismatch warnings about the BLAS/LAPACK function declarations here:

extern "C" {
void dgemm_(char* transa, char* transb, int* m, int* n, int* k, double* alpha, const double* a, int* lda, const double* b, int* ldb, double *beta, double* c, int* ldc);
void dpotrf_(char *uplo, int *n, double *a, int *lda, int *info);
void dsytrf_(char *uplo, int *n, double *a, int *lda, int *ipiv, double *work, int *lwork, int *info);
void dtrsm_(char *side, char *uplo, char *transa, char *diag, int *m, int *n, const double *alpha, const double *a, int *lda, double *b, int *ldb);
void dsyrk_(char *uplo, char *trans, int *n, int *k, double *alpha, const double *a, int *lda, double *beta, double *c, int *ldc);
void dtrsv_(char *uplo, char *trans, char *diag, int *n, const double *a, int *lda, double *x, int *incx);
void dgemv_(char *trans, int *m, int *n, const double* alpha, const double* a, int *lda, const double* x, int* incx, const double* beta, double* y, int* incy);
}

For example for dpotrf:

[...]/spral/src/ssids/cpu/kernels/wrappers.cxx:12:9: warning: type of ‘dpotrf_’ does not match original declaration [-Wlto-type-mismatch]
   12 |    void dpotrf_(char *uplo, int *n, double *a, int *lda, int *info);
      |         ^
[...]/hsl_ma97-2.5.0/src/hsl_ma97d.f90:6846:5: note: type mismatch in parameter 6
 6846 |          call dpotrf('L', p, a, lda, lapack_info)
      |     ^
[...]/hsl_ma97-2.5.0/src/hsl_ma97d.f90:6846:5: note: type ‘long int’ should match type ‘void’
[...]/hsl_ma97-2.5.0/src/hsl_ma97d.f90:6846:5: note: ‘dpotrf’ was previously declared here

The issue is that the Fortran function requires the length of the uplo parameter. Most Fortran compilers therefore add hidden arguments to the end of the function for each character parameter that are that length. The C declarations are strictly speaking therefore not compatible with the Fortran functions.

Some digging reveals that this issue is present in many codes that call BLAS/LAPACK functions from C and causes real problems since 2019. Here are some references:

Not quite sure what the best solution would be for this project. Implementing this correctly cross-platform does not seem trivial to me. What do you think?

@jfowkes
Copy link
Contributor

jfowkes commented May 28, 2021

@mjacobse many thanks for looking into this in detail, this really is a case of CBLAS/LAPACKE incorrectly depending on undefined compiler behaviour for years (namely omitting the hidden length argument for single-character strings) as nicely explained in the LWN.net article above.

My suggestion would be to follow the R developers on this and, until CBLAS/LAPACKE gets fixed (a fix was merged 3 days ago but has yet to make it into a release), add either -ftail-call-workaround=2 or -fno-optimize-sibling-calls to gfortran to revert to the previous behaviour, although I'm not sure which of the two compiler flags is preferable?

@jfowkes jfowkes added this to the New Release milestone Jun 14, 2021
@martin-frbg
Copy link

@jfowkes my understanding is/was that the -ftail-call-workaround option was created when the problem became known, while the older -fno-optimize-sibling-calls option has the side effect of avoiding problematic assumptions in older compiler versions as well. (Which is why OpenBLAS goes with -fno-optimize-sibling-calls for the LAPACK&LAPACKE it imports from netlib)

@jfowkes
Copy link
Contributor

jfowkes commented Jun 24, 2021

@martin-frbg many thanks for clarifying this, then we should also go for -fno-optimize-sibling-calls.

@mjacobse
Copy link
Collaborator

mjacobse commented Aug 9, 2022

CBLAS/LAPACKE has since fixed this with the 3.10 release:
https://github.com/Reference-LAPACK/lapack/blob/v3.10.1/CBLAS/include/cblas_f77.h#L18
https://github.com/Reference-LAPACK/lapack/blob/v3.10.1/LAPACKE/include/lapack.h#L19
(Though they went with the arguably quite brittle fix of just assuming that those size parameters exist and come last. Perhaps that is why the issue is still open.)

Coming back to this, I am not sure anymore how this changes anything for this issue though. I do see uses of CBLAS in some example and test files for SSMFE, but no more than that. This issue for SSIDS in particular does not come from CBLAS/LAPACKE, but instead from the manual declaration of the BLAS functions in the SSIDS code itself:

extern "C" {
void dgemm_(char* transa, char* transb, int* m, int* n, int* k, double* alpha, const double* a, int* lda, const double* b, int* ldb, double *beta, double* c, int* ldc);
void dpotrf_(char *uplo, int *n, double *a, int *lda, int *info);
void dsytrf_(char *uplo, int *n, double *a, int *lda, int *ipiv, double *work, int *lwork, int *info);
void dtrsm_(char *side, char *uplo, char *transa, char *diag, int *m, int *n, const double *alpha, const double *a, int *lda, double *b, int *ldb);
void dsyrk_(char *uplo, char *trans, int *n, int *k, double *alpha, const double *a, int *lda, double *beta, double *c, int *ldc);
void dtrsv_(char *uplo, char *trans, char *diag, int *n, const double *a, int *lda, double *x, int *incx);
void dgemv_(char *trans, int *m, int *n, const double* alpha, const double* a, int *lda, const double* x, int* incx, const double* beta, double* y, int* incy);
}

It's basically what CBLAS/LAPACKE did in the past, so the same issue was replicated here in SSIDS. That's why I don't see how a CBLAS/LAPACKE update would fix this here, I believe this should be fixed locally. Unless the idea was to switch to using CBLAS to replace these manual declarations? Or was the idea to copy their fix here locally?

Personally neither the compiler flag workaround, nor the fix of manually adding size parameters to the C declarations that CBLAS/LAPACKE did seem great to me. If there is interest, I can look into adding properly C interoperable wrapper functions in Fortran for the few BLAS routines that are used from wrapper.cxx. As noted in the CBLAS/LAPACKE issue, that might introduce another level of function calls (if they aren't optimized out anyways), but at least it is a robust solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants