-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load balance wave tests #461
Comments
The two wave tests fv3_gfdlmprad and fv3_gfdlmprad_atmwav did fail in recent code commit as they took too long to run. Currently the low resolution RT tests (1 degree fv3 and fv3+wave) have run time limit 20mins.
@JessicaMeixner-NOAA @aliabdolali The ww3 still causes delays in the fv3-ww3 low resolution runs, is this what you expect? Do you have any suggestion what can be done? It seems to me low resolution ww3 scales not as good as to the higher resolutions ww3 run in coupled C384 runs, is there any reason? Also I just increased the ww3 tasks without changing any other ww3 settings, please let me know if additional changes are required. To resolve the low resolution wave test issue in UFS RT, I think we will increase the ww3 tasks from 42 to 82 at this time. Please let us know if you have any suggestions. Thanks |
@junwang-noaa Do you want to make the grid coarser? I have a 2-degree global grid, so we can replace it in the setup to reduce computations. |
@aliabdolali FV3 is running 1 degree, so I think 1 degree ww3 should be fine. If you have standalone ww3 RT set up, please let us know the computational time. So far the one way coupling ww3 results are identical when using the different ww3 mpi tasks. |
We need to add more nodes to the wave model. My suggestion would be to use the ESMF profiling tools and make sure the increase to 82 from 48 is sufficient. |
The ESMF profiling shows that even 162 tasks ww3 runs much slower then fv3
with 150 tasks, ww3 takes 170s, while fv3 takes 90s. please see below:
[WAV] RunPhase1 162 24 169.3507
169.2469 280 169.4389 310
[ATM] RunPhase1 150 24 84.1945
38.9942 129 92.0002 144
It seems there is some issue with low resolution ww3 runs, we do not want
to use too many tasks on ww3 for these low resolution runs. We choose 82s
ww3 tasks to reduce the total run time from previous 800-850s to 350-400s.
We are using 15 mins wall clock time for low resolution runs, fv3-ww3 tests
sometimes failed to finish within 15mins window with 42 ww3 tasks.
…On Wed, Apr 7, 2021 at 8:43 AM Jessica Meixner ***@***.***> wrote:
We need to add more nodes to the wave model. My suggestion would be to use
the ESMF profiling tools and make sure the increase to 82 from 48 is
sufficient.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#461 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TKAHVGNIOFOHF4JTODTHRHOTANCNFSM4Y6AG3DQ>
.
|
If there's a maximum number of nodes that WW3 is allowed, and you would like it to run in a certain time period, we likely will need to explore even lower resolution for WW3 as @aliabdolali suggested. If you give us a maximum number of nodes we can design a wave grid to fit that or we we will need to use more nodes for the waves. |
So far it would take too many ww3 tasks to get comparing running time for 1
degree ww3 and 1 degree fv3. We hope to get the fv3-ww3 tests done within
5mins with ~240 tasks (6-10nodes). If you have a new wave grid that can fit
to the time frame, that would be good. Thanks
…On Wed, Apr 7, 2021 at 9:59 AM Jessica Meixner ***@***.***> wrote:
If there's a maximum number of nodes that WW3 is allowed, and you would
like it to run in a certain time period, we likely will need to explore
even lower resolution for WW3 as @aliabdolali
<https://github.com/aliabdolali> suggested. If you give us a maximum
number of nodes we can design a wave grid to fit that or we we will need to
use more nodes for the waves.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#461 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TJJWBOTEOXTUBHFAYTTHRQNDANCNFSM4Y6AG3DQ>
.
|
@junwang-noaa @JessicaMeixner-NOAA |
…unity#461) * modify FV3GFS_io.F90 by fixing errors associated with initializing snow depth over ice in the case where both land and water coexist (i.e. fractional grid case) * z0ice is changed to 1.0 cm from 1.1cm in atmos_model.F90
* Modify path to external GFS data * Modify path to external GFS data for WE2E
* [develop] Added return code to srw_build.sh script. * Correct issue in sorc/CMakeLists.txt so that the ufs-model will build again. * Restore PID functionality in tests/build.sh for GitHub Actions autoci builds. * Add return code to tests/build.sh and read the code in srw_build.sh.
Description
Update load balancing for wave related tests.
Solution
It has been requested that tests with waves match the timing of the tests without waves.
The text was updated successfully, but these errors were encountered: