Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.25 degree configuration with a different MOM6 parameterization compared to #101 #135

Closed
minghangli-uni opened this issue Apr 9, 2024 · 18 comments
Labels

Comments

@minghangli-uni
Copy link
Contributor

minghangli-uni commented Apr 9, 2024

MOM_input

Most of the updates are sourced from discussions in namelist-discussion and referencing OM2 technical report. Some major updates are highlighted below,

  • surface boundary layer parameterisation:
    • Instead of KPP used in CVmix project, current configuration implements an energetic constrained parameterisation of the surface boundary layer (EPBL), providing vertically diffusivity and viscosity, and the depth of active mixing (BL thickness).
  • Removal of mesoscale eddy mixing parametersiations (e.g., USE_MEKE=False)
  • Set NK=50.
  • No parameters associated with TIDES
  • NUM_DIAG_COORDS=2 includes z_star and rho_2 (not sure if rho_2 is relevant for the current 0.25deg(MOM6 1deg configuration - No output generated for GMOM_JRA.mom6.h.rho2_*.nc ACCESS-NRI/access-om3-configs#40 (comment)))
  • Timesteps
    • baraclinic timestep DT: 1350s
    • tracer timestep DT_THERM: 1350s
    • coupling timestep: 1350s
    • ice thermodynamic timestep: 1350s

With THERMO_SPANS_COUPLING = True, tracer timestep can be integer multiple of DT. However, as is mentioned in comments within MOM6_input, DT_THERM should be less than coupling timestep. So we may think about increasing the coupling timestep beyond 1350s (A good question proposed by @dougiesquire ACCESS-NRI/access-om3-configs#48 (comment))

I think we maybe need to discuss whether we really want to be changing the coupling timestep here (and if so, what we want to do with MOM's thermodynamic timestep).

  • tracer timestep DT_THERM: 2700s can lead to a speedup of 20% for each model year. (The comparison is conducted with DIABATIC_FIRST = False)

Ice initial condition

The ice initial condition is set to "default". Additionally, another experiment with the ice initial condition (following #50) from a 3-hour run of OM2 is running at the same time.


Other params

All the other parameters or namelists remain consistent and up-to-date to https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss101 and OM2 technical report.

For example in ice_in,

&domain_nml
  block_size_x = 30
  block_size_y = 27
  distribution_type = "roundrobin"
  distribution_wght = "latitude"
  maskhalo_bound = .true.
  maskhalo_dyn = .true.
  maskhalo_remap = .true.
  max_blocks = 8
  ns_boundary_type = "tripole"
  nx_global = 1440
  ny_global = 1080
  processor_shape = "square-ice"

where block_size_x = 30, block_size_y = 27 are consistent with OM2 technical report. max_blocks=8 is evaluated by this snippet,

     if (max_blocks < 1) then
       max_blocks=( ((nx_global-1)/block_size_x + 1) *         &
                    ((ny_global-1)/block_size_y + 1) - 1) / nprocs + 1
       max_blocks=max(1,max_blocks)
       write(nu_diag,'(/,a52,i6,/)') &
         '(ice_domain): max_block < 1: max_block estimated to ',max_blocks
     endif

GADI consumption:

  • When running a model year using OM2 with a time step of DT=1350s, and OM3 with DT_THERM=DT=1350s, the service units required using OM2 and OM3 are approximately 8.39KSU and 11.2KSU, respectively. This indicates that the current OM3 is slower than OM2 by 33%.
  • However, when OM3 is configured with DT_THERM=2*DT=2700s, the service units required using OM3 (8.2KSU) become comparable to those of OM2 (8.39KSU).

Limitations

The current configuration runs sequentially and is restricted to a maximum of 288 CPU cores. If the number of CPU cores exceeds this limit, the model will hang without providing any useful information. I am still investigating this issue to determine the cause.


The branch is linked here, https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135

@aekiss
Copy link
Contributor

aekiss commented Apr 10, 2024

Thanks @minghangli-uni for documenting those details. Is there a branch or draft PR you can link to with these changes?

@adele-morrison
Copy link

Have you tried running with DT_THERM > 2700? Seems like it could be a good way to improve performance, especially when we start running with BGC.

Regarding the comment above,

However, as is mentioned in comments within MOM6_input, DT_THERM should be less than coupling timestep.

There is a follow on from this in MOM_input saying "unless THERMO_SPANS_COUPLING is true, in which case DT_THERM can be an integer multiple of the coupling timestep". So I think it's fine to have DT_THERM > dt_cpld.

e.g. GFDL OM4_025 uses:

  • dt_cpld = 3600
  • DT = 900.0
  • DT_THERM = 7200.0
  • THERMO_SPANS_COUPLING = True

Have we tested the performance using similar timesteps to that? Could always reduce dt_cpld to 1800 if that's a worry.

@minghangli-uni
Copy link
Contributor Author

The branch is linked here, https://github.com/COSIMA/MOM6-CICE6/tree/025deg_jra55do_ryf_iss135

@dougiesquire
Copy link
Collaborator

@adele-morrison, were your issues with DT_THERM related to the open boundaries?

@adele-morrison
Copy link

adele-morrison commented Apr 10, 2024 via email

@minghangli-uni
Copy link
Contributor Author

minghangli-uni commented Apr 10, 2024

@adele-morrison

Have you tried running with DT_THERM > 2700?

I havent tried but I am planning to run a test with 4 and 8 times greater than DT.

e.g. GFDL OM4_025 uses:
dt_cpld = 3600
DT = 900.0
DT_THERM = 7200.0
THERMO_SPANS_COUPLING = True
Have we tested the performance using similar timesteps to that? Could always reduce dt_cpld to 1800 if that's a worry.

I've tried multiple tests. I observed that regardless of changes in other timesteps or the value of ntdt being 1 or 2, two errors consistently occurred in the first two years when dt_cpld was greater than or equal to 1800s.

FATAL from PE 100: write energy: Ocean velocity has been truncated too many times
(abort ice) error = (diagnostic abort) ERROR: bad departure points

@dougiesquire
Copy link
Collaborator

@minghangli-uni, can you easily test the timing and compare the outputs of runs with longer tracer timesteps? E.g. DT_THERM = 5400.0, 6750.0, 8100.0, keeping the MOM baraclinic timestep, CICE thermodynamic timestep and coupling timestep all set to 1350s.

@minghangli-uni
Copy link
Contributor Author

Won't be a problem I think. As the current config is limited by the number of CPU cores (e.g. 288), it will take around 14 hours to receive one model year results.

@dougiesquire
Copy link
Collaborator

Won't be a problem I think. As the current config is limited by the number of CPU cores (e.g. 288), it will take around 14 hours to receive one model year results.

Are you saying that you think the model won't run any faster, or that you will do the runs to check (or something else)?

@minghangli-uni
Copy link
Contributor Author

I am currently investigating the core cap issue. Additionally, I plan to examine whether increasing DT_THERM will impact physical fields. Based on these findings, we will determine the extent to which we can achieve a speedup.

@dougiesquire
Copy link
Collaborator

I am currently investigating the core cap issue.

What is this?

@minghangli-uni
Copy link
Contributor Author

Limitations
The current configuration runs sequentially and is restricted to a maximum of 288 CPU cores. If the number of CPU cores exceeds this limit, the model will hang without providing any useful information. I am still investigating this issue to determine the cause.

Look the above.

@dougiesquire
Copy link
Collaborator

dougiesquire commented Apr 10, 2024

@minghangli-uni some of the changes in the branch you've linked to this issue overlap with changes that are being made in ACCESS-NRI/access-om3-configs#48 (i.e. the changes to NK, the timesteps, the CICE initial conditions and CICE block sizes are all done there). If you think the changes being proposed in that PR are not correct, please review/comment in that PR.

I think your other changes either require testing and/or require the same change across many repos. These are best handled one at a time. I suggest:

  • Open an issue for each proposed change
  • Address each change in a dedicated branch and associated PR

@minghangli-uni
Copy link
Contributor Author

@dougiesquire This is a good point. Will follow your suggestion and implement changes accordingly.

@aekiss
Copy link
Contributor

aekiss commented Apr 10, 2024

Are you planning to use @micaeljtoliveira's profiling tools for the test runs?

@minghangli-uni
Copy link
Contributor Author

I will firstly work out the concurrent run and do test runs with increased CPU cores for MOM. This will reduce turn-around time and achieve results in a shorter walltime. Then I will use profiling tools (https://github.com/COSIMA/om3-utils/tree/profiling) to fine tune the optimal process layout for 025 deg configuration.

@dougiesquire
Copy link
Collaborator

@minghangli-uni can this be closed now or is there still a reason to keep it open?

@minghangli-uni
Copy link
Contributor Author

I am happy to close it now. Thanks @dougiesquire

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants