Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traditional threading as option for forecast model #3122

Open
JessicaMeixner-NOAA opened this issue Nov 22, 2024 · 2 comments · May be fixed by #3149
Open

Traditional threading as option for forecast model #3122

JessicaMeixner-NOAA opened this issue Nov 22, 2024 · 2 comments · May be fixed by #3149
Assignees
Labels
feature New feature or request

Comments

@JessicaMeixner-NOAA
Copy link
Contributor

What new functionality do you need?

Due to an issue with the model hanging on orion/hercules at high resolutions (C768/C1152) being most likely associated with the esmf managed threading (see ufs-community/ufs-weather-model#2486 for more details), to support GFSv17 development we'd like to get traditional threading as an option in the g-w. This comment from @aerorahul ufs-community/ufs-weather-model#2486 (comment) says that a different set of ufs_configure files are needed. I think these are now available in ufs-weather-model though. I believe those are the options without the _esmf

Hercules and orion are being explored as potential options for retrospective runs as WCOSS2 is busy with multiple large implementations coming up.

What are the requirements for the new functionality?

Allow for an option to run with traditional threading since ESMF managed threading appears to be an issue currently on hercules/orion.

Acceptance Criteria

  • [] Forecast completes with traditional threading
  • [] forecast still completes with esmf managed threading (where it's possible)

Suggest a solution (optional)

I believe that the calculation of resources also needs to be updated.

@aerorahul
Copy link
Contributor

@JessicaMeixner-NOAA
I will take a look at this issue and prioritize it.

Just to explain the nature of change required to fulfill this issue the ufs.configure templates for traditional threading are available in the ufs-weather-mode and will be usedl. Another piece of information we need to acquire is how to run the executable and if there are any changes to the resource requests at the job-card #SBATCH lines.

We will need to replace the APRUN command; srun -n <nprocs> $FCSTEXEC line with a more detailed execution sequence for traditional threading, such as this:

time mpiexec -l --line-buffer -n 1392  -ppn 32  --cpu-bind depth --depth 4 env OMP_NUM_THREADS=4 $FCSTEXEC : \
                          	-n 220   -ppn 128 --cpu-bind depth --depth 1 env OMP_NUM_THREADS=1 $FCSTEXEC : \
                          	-n 120   -ppn 120 --cpu-bind depth --depth 1 env OMP_NUM_THREADS=1 $FCSTEXEC : \
                          	-n 80	-ppn 64  --cpu-bind depth --depth 2 env OMP_NUM_THREADS=2 $FCSTEXEC

The numbers here are just representative of the detail needed to construct the APRUN command.

@JessicaMeixner-NOAA
Copy link
Contributor Author

@aerorahul that will be awesome to have that level of detail in the executable instead of having everything have the same number of threads!!

@WalterKolczynski-NOAA WalterKolczynski-NOAA removed the triage Issues that are triage label Dec 10, 2024
@aerorahul aerorahul linked a pull request Dec 10, 2024 that will close this issue
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants