unstable bug of "Assertion `psi_norm > 0.0' failed" in Davidson method #2298
Replies: 3 comments
-
This bug is occurred in the ABACUS complied by intel, and ABACUS complied by GNU has no problem. Interestingly, I try to running this function by each element, and the bug is not occurred. Even I don't know why, but it works, at least not occurred in 100 repeat running on below Al example. |
Beta Was this translation helpful? Give feedback.
-
#1221 is a temporary fix; it will cause significant communication overhead. Fixing MKL linking issue should be the best solution. My profiling result of P000_si16_pw with 32 processes:
|
Beta Was this translation helpful? Give feedback.
-
The davidson method need do The old version (before #1221 ) do the ALL TO ALL MPI_REDUCE in #1221 is a different solution that doing the MPI_Reduce to rank0, and then do So in #1221 , there have an extra cost in hsolver::DiagoDavid::diag_zhegvx and Parallel_Common::bcast_complex_double, but a smaller cost in cal_elem. |
Beta Was this translation helpful? Give feedback.
-
Describe the Bug
In this LiSi system, when using davidson method, ABACUS sometimes throw the error of "Assertion `psi_norm > 0.0' failed". This bug randomly occur when using 16 threads on 16 physical cores' machine.
abacus: /root/abacus-develop/source/module_hsolver/diago_david.cpp:544: void hsolver::DiagoDavid::SchmitOrth(hamilt::Hamilt *, const int &, int, int, const ModuleBase::ComplexMatrix &, std::complex *, std::complex *): Assertion `psi_norm > 0.0' failed.
dav.zip
Beta Was this translation helpful? Give feedback.
All reactions