Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic errors processing Gd-157 from JEFF-4T1 #273

Closed
andrewmholcomb opened this issue Nov 15, 2022 · 10 comments
Closed

Sporadic errors processing Gd-157 from JEFF-4T1 #273

andrewmholcomb opened this issue Nov 15, 2022 · 10 comments

Comments

@andrewmholcomb
Copy link

64-gd-157g.txt
64-gd-157g.njoy-input.txt

Bonjour mes amis,

I'm having some sporadic failures with processing the attached Gd-157 file using NJOY with the attached input. The code seems to hang or segfault (or run through depending on which computer I use) in the covr step. This file is taken from one of the JEFF-4 beta releases. The only thing that stands out to me is that the covariance is using LRF=7. In the JEFF-4 beta, there are only 5 isotopes with LRF=7 covariance data. O-16, Cl-35, Rh-103, Gd-155, Gd-157. Gd-155 experiences the same failure, Cl-35 and Rh-103 process without issue, and O-16 finishes but takes an inordinate amount of time.

I've compiled NJOY from source on Ubuntu-20.04 using cmake version 3.16.3 and gnu 9.4.0.

I'm not sure what is going wrong, but adding print statements in the code changed the failure mode so I suspect it is some sort of painful memory defect.

Let me know if you need any more info or can point out what I've done wrong.

Thanks!!

@andrewmholcomb
Copy link
Author

Additional info about the configuration, the cmake command I used is
cmake -DCMAKE_BUILD_TYPE:STRING=RELEASE /path/to/njoy/source
I've also tried
cmake -DCMAKE_BUILD_TYPE:STRING=RELWITHDEBINFO -DCMAKE_Fortran_FLAGS:STRING=-finit-local-zero /path/to/njoy/source
but it still segfaults.

I also tried increasing the size of xval and icon in covr.f90 (and adjusted nvmax and ncmax to match) but the problem persists. The first array bound that gets violated is on this line

@andrewmholcomb
Copy link
Author

Lastly, here is the valgrind output for the entire job.
dump.txt

Executed with valgrind version 3.15.0
valgrind --vgdb=full --keep-debuginfo=yes --xml=yes --xml-file=dump.xml /home/njoy2016/bin/njoy < 64-gd-157g.njoy-input

@jchsublet
Copy link

Bonjour le Parisien,

I believe your issue(s) are related to the compiler GNU and its version. For some time now when NJOY2016 is compiled with GNU above 8.3 and till 12.0, even the errorr compilation would fail on some platform, did you run make tests for your installation? did all tests passed? Partly related to #211 that by the way will force you to move to NJOY2016.68 (I see 65 written in your input) the latest October 5 release or use intel ifort instead, it also is a good free compiler from Intel oneAPI

Please keep in mind that the ENDF-6 format manual specification to read an LRF=7 covariance file (written only by SAMMY to my knowledge) has evolved with time, does all LRF=7 MF-32 also did accordingly, does errorr also followed?

@andrewmholcomb
Copy link
Author

Bonjour Monsieur Sublet,

All NJOY tests pass, and the error is the same whether using NJOY2016.68 or NJOY2016.65. Equivalent inputs work for all of the JEFF-4T1 isotopes except for Gd-155 and Gd-157.

@nathangibson14
Copy link
Contributor

Hi Andrew, Jean-Christophe,

There is a compiler issue with GCC 11 that leads to an internal compiler error when compiling ERRORR. The issues Andrew is describing do not sound related to this issue. JC's other comments of LRF7 evolving with time are more than likely the issue here. For instance, if MF32 is not present but LRF7 is used (as in ENDF/B-VIII.0's Fe-54), the current release of NJOY will crash (but we have a fix that will be released very soon).

We'll have to look into these new JEFF evaluations. Failures in COVR surprise me, though, as all the heavy lifting should be happening in ERRORR.

P.S. Please forgive my lack of French skills 😄

@andrewmholcomb
Copy link
Author

I am also surprised! Let me know if there's anything I can do or information to report that will help!

@jchsublet
Copy link

Hi Nathan, Andrew

Andrew's deck with the above Gd157 file work with NJOY2016.68 when compiled with Intel ifort 2021 (Bob's favorite) on macOS Monterey outGd157.txt but also when compiled with GNU 12.1.0 output.txt on the same platform

Cela me laisse sans voie: speechless in Twain

@andrewmholcomb
Copy link
Author

andrewmholcomb commented Nov 16, 2022

I was able to get our NJOY2016.65 version to run after building with ifort instead of gfortran . I haven't checked that the results are good or bad but at least the job finishes instead of a segfault or infinite hang.

For posterity, I installed by downloading the offline version from here and followed the installation steps from here. For ifort to be found after installation this way, you may also need to source /opt/intel/oneapi/setvars.sh.

ifort --version
ifort (IFORT) 2021.7.1 20221019
Copyright (C) 1985-2022 Intel Corporation.  All rights reserved.

configured with cmake -DCMAKE_BUILD_TYPE:STRING=RELEASE -DCMAKE_Fortran_COMPILER:FILEPATH=/opt/intel/oneapi/compiler/2022.2.1/linux/bin/intel64/ifort /path/to/njoy/source

I did this locally but will try to get the same working in our GitLab CI as it may be the only path forward to allow us to automatically process all of the files for JANIS. Otherwise we will have to temporarily remove them from being processed automatically because the job hangs and prevents the rest of the artifacts from being collected. Update: This did circumvent the bug in our GitLab CI. Not a permanent solution but at least we have a bandaid!

Sorry I couldn't be more helpful in identifying the root cause but I think this still indicates a bug. Let me know if there's anything else you can think to try to get it working with the GNU 9.4 tools!

@andrewmholcomb
Copy link
Author

Hello again NJOY folks! I write to you on the one year anniversary of this issue to see if there has been any progress in figuring out the problem and fixing it? Hope all is well!

@andrewmholcomb
Copy link
Author

I don't blame you for not prioritizing this but I never thought this bug would outlive my NEA contract... Bon courage à la prochaine victime :)

@andrewmholcomb andrewmholcomb closed this as not planned Won't fix, can't repro, duplicate, stale Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants