0.25 degree configuration with a different MOM6 parameterization compared to #101 #135

minghangli-uni · 2024-04-09T22:37:19Z

MOM_input

Most of the updates are sourced from discussions in namelist-discussion and referencing OM2 technical report. Some major updates are highlighted below,

surface boundary layer parameterisation:
- Instead of KPP used in CVmix project, current configuration implements an energetic constrained parameterisation of the surface boundary layer (EPBL), providing vertically diffusivity and viscosity, and the depth of active mixing (BL thickness).
Removal of mesoscale eddy mixing parametersiations (e.g., USE_MEKE=False)
Set NK=50.
No parameters associated with TIDES
NUM_DIAG_COORDS=2 includes z_star and rho_2 (not sure if rho_2 is relevant for the current 0.25deg(MOM6 1deg configuration - No output generated for GMOM_JRA.mom6.h.rho2_*.nc ACCESS-NRI/access-om3-configs#40 (comment)))
Timesteps
- baraclinic timestep DT: 1350s
- tracer timestep DT_THERM: 1350s
- coupling timestep: 1350s
- ice thermodynamic timestep: 1350s

With THERMO_SPANS_COUPLING = True, tracer timestep can be integer multiple of DT. However, as is mentioned in comments within MOM6_input, DT_THERM should be less than coupling timestep. So we may think about increasing the coupling timestep beyond 1350s (A good question proposed by @dougiesquire ACCESS-NRI/access-om3-configs#48 (comment))

I think we maybe need to discuss whether we really want to be changing the coupling timestep here (and if so, what we want to do with MOM's thermodynamic timestep).

tracer timestep DT_THERM: 2700s can lead to a speedup of 20% for each model year. (The comparison is conducted with DIABATIC_FIRST = False)

Ice initial condition

The ice initial condition is set to "default". Additionally, another experiment with the ice initial condition (following #50) from a 3-hour run of OM2 is running at the same time.

Other params

All the other parameters or namelists remain consistent and up-to-date to https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss101 and OM2 technical report.

For example in ice_in,

&domain_nml
  block_size_x = 30
  block_size_y = 27
  distribution_type = "roundrobin"
  distribution_wght = "latitude"
  maskhalo_bound = .true.
  maskhalo_dyn = .true.
  maskhalo_remap = .true.
  max_blocks = 8
  ns_boundary_type = "tripole"
  nx_global = 1440
  ny_global = 1080
  processor_shape = "square-ice"

where block_size_x = 30, block_size_y = 27 are consistent with OM2 technical report. max_blocks=8 is evaluated by this snippet,

     if (max_blocks < 1) then
       max_blocks=( ((nx_global-1)/block_size_x + 1) *         &
                    ((ny_global-1)/block_size_y + 1) - 1) / nprocs + 1
       max_blocks=max(1,max_blocks)
       write(nu_diag,'(/,a52,i6,/)') &
         '(ice_domain): max_block < 1: max_block estimated to ',max_blocks
     endif

GADI consumption:

When running a model year using OM2 with a time step of DT=1350s, and OM3 with DT_THERM=DT=1350s, the service units required using OM2 and OM3 are approximately 8.39KSU and 11.2KSU, respectively. This indicates that the current OM3 is slower than OM2 by 33%.
However, when OM3 is configured with DT_THERM=2*DT=2700s, the service units required using OM3 (8.2KSU) become comparable to those of OM2 (8.39KSU).

Limitations

The current configuration runs sequentially and is restricted to a maximum of 288 CPU cores. If the number of CPU cores exceeds this limit, the model will hang without providing any useful information. I am still investigating this issue to determine the cause.

The branch is linked here, https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135

The text was updated successfully, but these errors were encountered:

aekiss · 2024-04-10T00:00:22Z

Thanks @minghangli-uni for documenting those details. Is there a branch or draft PR you can link to with these changes?

adele-morrison · 2024-04-10T00:00:54Z

Have you tried running with DT_THERM > 2700? Seems like it could be a good way to improve performance, especially when we start running with BGC.

Regarding the comment above,

However, as is mentioned in comments within MOM6_input, DT_THERM should be less than coupling timestep.

There is a follow on from this in MOM_input saying "unless THERMO_SPANS_COUPLING is true, in which case DT_THERM can be an integer multiple of the coupling timestep". So I think it's fine to have DT_THERM > dt_cpld.

e.g. GFDL OM4_025 uses:

dt_cpld = 3600
DT = 900.0
DT_THERM = 7200.0
THERMO_SPANS_COUPLING = True

Have we tested the performance using similar timesteps to that? Could always reduce dt_cpld to 1800 if that's a worry.

minghangli-uni · 2024-04-10T00:20:32Z

The branch is linked here, https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135

dougiesquire · 2024-04-10T00:25:06Z

@adele-morrison, were your issues with DT_THERM related to the open boundaries?

adele-morrison · 2024-04-10T00:26:51Z

Yes, we only had a problem with DT_THERM in regional cases. I think large DT_THERM in global should be fine.

…

On 10 Apr 2024, at 10:25 am, Dougie Squire ***@***.***> wrote: @adele-morrison, were your issues with DT_THERM related to the open boundaries? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

minghangli-uni · 2024-04-10T00:30:35Z

@adele-morrison

Have you tried running with DT_THERM > 2700?

I havent tried but I am planning to run a test with 4 and 8 times greater than DT.

e.g. GFDL OM4_025 uses:
dt_cpld = 3600
DT = 900.0
DT_THERM = 7200.0
THERMO_SPANS_COUPLING = True
Have we tested the performance using similar timesteps to that? Could always reduce dt_cpld to 1800 if that's a worry.

I've tried multiple tests. I observed that regardless of changes in other timesteps or the value of ntdt being 1 or 2, two errors consistently occurred in the first two years when dt_cpld was greater than or equal to 1800s.

FATAL from PE 100: write energy: Ocean velocity has been truncated too many times
(abort ice) error = (diagnostic abort) ERROR: bad departure points

dougiesquire · 2024-04-10T00:40:02Z

@minghangli-uni, can you easily test the timing and compare the outputs of runs with longer tracer timesteps? E.g. DT_THERM = 5400.0, 6750.0, 8100.0, keeping the MOM baraclinic timestep, CICE thermodynamic timestep and coupling timestep all set to 1350s.

minghangli-uni · 2024-04-10T00:44:55Z

Won't be a problem I think. As the current config is limited by the number of CPU cores (e.g. 288), it will take around 14 hours to receive one model year results.

dougiesquire · 2024-04-10T00:49:00Z

Won't be a problem I think. As the current config is limited by the number of CPU cores (e.g. 288), it will take around 14 hours to receive one model year results.

Are you saying that you think the model won't run any faster, or that you will do the runs to check (or something else)?

minghangli-uni · 2024-04-10T01:01:38Z

I am currently investigating the core cap issue. Additionally, I plan to examine whether increasing DT_THERM will impact physical fields. Based on these findings, we will determine the extent to which we can achieve a speedup.

dougiesquire · 2024-04-10T01:04:49Z

I am currently investigating the core cap issue.

What is this?

minghangli-uni · 2024-04-10T01:06:15Z

Limitations
The current configuration runs sequentially and is restricted to a maximum of 288 CPU cores. If the number of CPU cores exceeds this limit, the model will hang without providing any useful information. I am still investigating this issue to determine the cause.

Look the above.

dougiesquire · 2024-04-10T04:41:16Z

@minghangli-uni some of the changes in the branch you've linked to this issue overlap with changes that are being made in ACCESS-NRI/access-om3-configs#48 (i.e. the changes to NK, the timesteps, the CICE initial conditions and CICE block sizes are all done there). If you think the changes being proposed in that PR are not correct, please review/comment in that PR.

I think your other changes either require testing and/or require the same change across many repos. These are best handled one at a time. I suggest:

Open an issue for each proposed change
Address each change in a dedicated branch and associated PR

minghangli-uni · 2024-04-10T04:53:54Z

@dougiesquire This is a good point. Will follow your suggestion and implement changes accordingly.

aekiss · 2024-04-10T05:27:34Z

Are you planning to use @micaeljtoliveira's profiling tools for the test runs?

minghangli-uni · 2024-04-10T06:25:51Z

I will firstly work out the concurrent run and do test runs with increased CPU cores for MOM. This will reduce turn-around time and achieve results in a shorter walltime. Then I will use profiling tools (https://github.com/COSIMA/om3-utils/tree/profiling) to fine tune the optimal process layout for 025 deg configuration.

dougiesquire · 2024-04-23T23:57:06Z

@minghangli-uni can this be closed now or is there still a reason to keep it open?

minghangli-uni · 2024-04-24T00:16:16Z

I am happy to close it now. Thanks @dougiesquire

aekiss mentioned this issue Apr 10, 2024

Upgrade 1-Degree to 025-Degree RYF Configuration ACCESS-NRI/access-om3-configs#48

Merged

dougiesquire added the om3-025 label Apr 10, 2024

minghangli-uni closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.25 degree configuration with a different MOM6 parameterization compared to #101 #135

0.25 degree configuration with a different MOM6 parameterization compared to #101 #135

minghangli-uni commented Apr 9, 2024 •

edited

Loading

aekiss commented Apr 10, 2024

adele-morrison commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 10, 2024

adele-morrison commented Apr 10, 2024 via email

minghangli-uni commented Apr 10, 2024 •

edited

Loading

dougiesquire commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 10, 2024 •

edited

Loading

minghangli-uni commented Apr 10, 2024

aekiss commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 23, 2024

minghangli-uni commented Apr 24, 2024

0.25 degree configuration with a different MOM6 parameterization compared to #101 #135

0.25 degree configuration with a different MOM6 parameterization compared to #101 #135

Comments

minghangli-uni commented Apr 9, 2024 • edited Loading

MOM_input

Ice initial condition

Other params

GADI consumption:

Limitations

aekiss commented Apr 10, 2024

adele-morrison commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 10, 2024

adele-morrison commented Apr 10, 2024 via email

minghangli-uni commented Apr 10, 2024 • edited Loading

dougiesquire commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 10, 2024 • edited Loading

minghangli-uni commented Apr 10, 2024

aekiss commented Apr 10, 2024

minghangli-uni commented Apr 10, 2024

dougiesquire commented Apr 23, 2024

minghangli-uni commented Apr 24, 2024

minghangli-uni commented Apr 9, 2024 •

edited

Loading

minghangli-uni commented Apr 10, 2024 •

edited

Loading

dougiesquire commented Apr 10, 2024 •

edited

Loading