Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The weather model is failing with mpp_domains_define.inc: At least one pe in pelist is not used by any tile in the mosaic error message #362

Open
MichaelLueken opened this issue Oct 15, 2024 · 0 comments
Labels
question Further information is requested

Comments

@MichaelLueken
Copy link

Is your question related to a problem? Please describe.
While updating the Short-Range Weather Application to stop using the deprecated atmos_nthreads and use ATM_omp_num_threads, about half of the comprehensive workflow end-to-end (WE2E) tests are now failing with the following error message:

FATAL from PE *: mpp_domains_define.inc: At least one pe in pelist is not used by any tile in the mosaic

This is confusing because there were no issues while using atmos_nthreads, there appears to be different behaviors depending on the machine that the updated code is run on, and the ufs.configure, model_configure, and input.nml files look correct.

Since there appears to be different behavior depending on the machine (on Hercules, all of the six fundamental tests pass, but one of the tests fails on Hera with the above message), does this suggest that there's an off-by-one or similar edge case based on node size?

Any clarification on what this error message represents and the best way to begin debugging would be greatly appreciated.

Describe what you have tried
From information gleaned off of Google, it looks like this error message occurs when there are issues with either layout or io_layout in the input.nml file. To that end, I've made sure that the layout entry is properly using the number of MPI tasks to sue in the two horizontal directions (x and y) of the regional grid. These are being properly set. Additionally, io_layout is being set automatically to 1, 1, which is expected for the SRW App.

Any clarification on the error message or suggestions on other things to try and correct this behavior would be greatly appreciated.

Thank you very much for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant