Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi_run_gmtb_scm.py failing silently #135

Closed
climbfuji opened this issue Nov 5, 2019 · 5 comments
Closed

multi_run_gmtb_scm.py failing silently #135

climbfuji opened this issue Nov 5, 2019 · 5 comments
Assignees

Comments

@climbfuji
Copy link
Collaborator

For certain errors multi_run_gmtb_scm.py is failing silently. I tried to run gmtb_scm after merging the GFSv16 changes but without the CIRES UGWP namelist bugfix (NCAR/ccpp-physics#343). The error reported from the model was that the namelist file input.nml cannot be found when running the model for a specific setup using

./run_gmtb_scm.py -c twpice -s SCM_GSD_v0

However, when I used multi_run_gmtb_scm.py it reported that all runs finished successfully. I got suspicious because the model run finished so quickly even though I forgot to copy the Thompson MP lookup tables.

@grantfirl
Copy link
Collaborator

I ran into this too. I'm pretty sure that multi_run_gmtb_scm.py checks the exit status of run_gmtb_scm.py and reports non-zero statuses, but perhaps run_gmtb_scm.py is still returning a zero error code for this failure? I will try to look into this.

@grantfirl
Copy link
Collaborator

If I remove the CIRES namelist bugfix and run via run_gmtb_scm.py, I get the following error message:

input.nml GW-namelist file
separate ugwp :: namelist file: input.nml does not exist
At line 157 of file /volumes/d1/grantf/code/gmtb-scm/ccpp/physics/physics/cires_ugwp_module.F90 (unit = 1, file = 'fort.1')
Fortran runtime error: End of file

Error termination. Backtrace:
#0 0x10c5c6be4
#1 0x10c5c7492
#2 0x10c5c7d9a
#3 0x10c6f73a4
#4 0x10c6f7d03
#5 0x10c6f9fb6
#6 0x10c6fa032
#7 0x10c6fa126
#8 0x10ab816cc
#9 0x10acff819
#10 0x10aecc795
#11 0x10aa84bd3
#12 0x10aa36b3e
#13 0x10af1fd3e

Somehow, the python subprocess is still returning a zero exit code when this error occurs. Other common errors like seg faults do return nonzero exit codes, however, and multi_run_gmtb_scm.py will correctly notify the user.

One clunky solution is to search the terminal output of the subprocess for terms that might indicate an error has occurred (despite a zero exit status). For example, if

p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, shell=True)

(output, err) = p.communicate()

p_status = p.wait()

executes the subprocess and the terminal output is captured in the 'output' variable, one could simply add
if 'error' in output:
notify the user that an error may have occurred and to re-run with one of the verbosity options to check the terminal output for unsuccessful runs.

@climbfuji
Copy link
Collaborator Author

If this can wait until when I am back (and you remind me), then I can take a look!

@climbfuji climbfuji self-assigned this Nov 18, 2019
@climbfuji
Copy link
Collaborator Author

Needs fix for the UFS release.

@climbfuji
Copy link
Collaborator Author

Duplicate of #278

@climbfuji climbfuji marked this as a duplicate of #278 Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants