Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.25deg configuration #101

Open
aekiss opened this issue Feb 13, 2024 · 33 comments
Open

0.25deg configuration #101

aekiss opened this issue Feb 13, 2024 · 33 comments

Comments

@aekiss
Copy link
Contributor

aekiss commented Feb 13, 2024

A priority for CMIP7 is to have a working 0.25° MOM6-CICE6 configuration.

I think the first step should be a working RYF configuration based on 1deg_jra55do_ryf but with 0.25° inputs and parameter modifications based on ACCESS-OM2, in order to quickly produce an initial prototype for use in ACCESS-CM3-025 development and for scaling performance tests.

At some point later we can develop IAF configs and also RYF and IAF for MOM6-CICE6-WW3.

After we switch to c-grid cice we can update topography to exploit c-grid and also take the opportunity to extend the grid further south.

@aekiss
Copy link
Contributor Author

aekiss commented Feb 13, 2024

Some tasks for the initial prototype:

  • set up as a new 025deg_jra55do_ryf branch in https://github.com/COSIMA/MOM6-CICE6, based on the 1deg_jra55do_ryf branch
  • create /g/data/ik11/inputs/access-om3/0.x.0/025deg/ containing files analogous to those in /g/data/ik11/inputs/access-om3/0.x.0/1deg/ but using the latest 0.25° inputs from ACCESS-OM2: /g/data/ik11/inputs/access-om2/input_20230515_025deg_topog (these are newer than those currently used in the 0.25° default configurations - see here)
  • update MOM6 and CICE6 parameters as needed - see Appendix E in the tech report

@access-hive-bot
Copy link

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/cosima-twg-meeting-minutes-2024/1734/3

@ezhilsabareesh8
Copy link
Contributor

The initial run of the quarter-degree configuration using the latest access-om2 grids and input files from /g/data/ik11/inputs/access-om2/input_20230515_025deg_topog encountered a failure at line 647-648 of ice_mesh_mod.F90 within the CICE codebase. This failure was due to a discrepancy between the internally generated longitudes and the mesh longitudes in CICE.

The ESMF mesh file was created using 0.25-degree access-om2 ocean hgrid and topog files via the generate_mesh.py script.

Below is the specific error message encountered during the run:

ERROR: CICE n, mesh_lon , lon, diff_lon =        1     84.1250000000000        83.8750000000000             0.25000D+00
ERROR: CICE n, mesh_lon , lon, diff_lon =        1     96.1250000000000        95.8750000000000             0.25000D+00
ERROR: CICE n, mesh_lon , lon, diff_lon =        1    108.1250000000000       107.8750000000000             0.25000D+00

 (abort_ice)ABORTED:
 (abort_ice) called from ice_mesh_mod.F90
 (abort_ice) line number          647
 (abort_ice) error =  ice_mesh_check:
 ERROR: (abort_ice) ice_mesh_check:
ERROR: CICE n, mesh_lon , lon, diff_lon =        1    128.1250000000000       127.8750000000000             0.25000D+00

 (abort_ice)ABORTED:
 (abort_ice) called from ice_mesh_mod.F90
 (abort_ice) line number          647
 (abort_ice) error =  ice_mesh_check:
 ERROR: (abort_ice) ice_mesh_check:
ERROR: CICE n, mesh_lon , lon, diff_lon =        1    144.1250000000000       143.8750000000000             0.25000D+00

@anton-seaice
Copy link
Contributor

Weird - this is the same code that Dougie had an issue with in the 1 degree config and put the patch in for:

https://github.com/COSIMA/access-om3/blob/main/CICE/patches/ice_mesh_mod.F90.patch

It looks like all the lons are offset by 0.25 degree, suggesting maybe a missing first/last column?

You maybe could set 'eps_imesh' to a big value and compare the grid from the cice output to the mesh file / hgrid file to debug this?

@ezhilsabareesh8
Copy link
Contributor

ezhilsabareesh8 commented Feb 22, 2024

It appears that the issue lies beyond the section of code in ice_mesh_mod.F90 that checks for longitude mismatches (refer here). I've tried increasing the tolerance value (eps_imesh), but the code exits after checking 9000 blocks, and the mismatch only seems to be increasing.

To proceed, I decided to suppress the check, allowing the code to run. However, I've encountered a new problem in the mediator. NaNs are being detected in the calculation of Sa_dens, where Sa_dens is calculated here based on datm pressure, temperature, and humidity. NaNs are occurring specifically in the fields exported to ice (refer here) which could be due to lon mismatch

@anton-seaice
Copy link
Contributor

Hi Ezhil

It looks like the CICE grid for 1 degree goes from -279 to 80 but for 0.25 degree goes from -280 to 79.75

cice1_grid = xr.open_dataset('/g/data/ik11/inputs/access-om3/0.x.0/1deg/cice/grid.nc')
np.rad2deg(cice1_grid.ulon.isel(ny=0).values)

Returns:
array([-278.99999784, -277.99999785, -276.99999785, ..., 77.9999994 , 78.99999939, 79.99999938])

cice_025 = xr.open_dataset('/g/data/ik11/inputs/access-om2/input_20230515_025deg_topog/cice_025deg/grid.nc')
np.rad2deg(cice_025.ulon.isel(ny=0).values)

Returns:
array([-280. , -279.75, -279.5 , ..., 79.25, 79.5 , 79.75])

Presumably the first one is correct (and would be consistent with u points / b-grid), and we need to shift the 0.25 degree grid to the right by 0.25 deg, but maybe @aekiss can confirm?

@anton-seaice
Copy link
Contributor

Ezhil pointed out that the mesh check is cheching tlon from CICE to the ESMF mesh file and my results above are from ulon. Tlon field is correct in both 1deg and 0.25deg CICE grid files, but CICE is not using tlon from the grid file and is instead interpolating it (https://github.com/CICE-Consortium/CICE/blob/9e9e5b3fabd88c429c4632baa8235187129e2dd7/cicecore/cicedyn/infrastructure/ice_grid.F90#L952).

@aekiss
Copy link
Contributor Author

aekiss commented Feb 22, 2024

It looks like there's an error in the OM2 0.25° grid - the u points are in the NW corner of cells instead of the NE corner. I'm investigating.

@aekiss
Copy link
Contributor Author

aekiss commented Feb 22, 2024

See new issue. We need to have some sort of testing to catch this sort of thing.

@anton-seaice
Copy link
Contributor

See new issue. We need to have some sort of testing to catch this sort of thing.

Are the current checks in the cmeps caps sufficient for OM3? It looks like both MOM and CICE caps have checks to compare between the component grid and the ESMF mesh file. Although the CICE allowed tolerance is 0.1, which we should probably configure to something stricter (MOM is 1e-4)?

They don't use all the variables in the model grid files however, so that could be a point of inconsistency.

@aekiss
Copy link
Contributor Author

aekiss commented Feb 22, 2024

A tolerance of 0.1° is far too coarse to be useful at high resolution - it would be good to use something finer, eg 1e-4.

It would be good to have a consistency check between all variables in the mom6 and cice6 grids somehow.

@anton-seaice
Copy link
Contributor

It would be good to have a consistency check between all variables in the mom6 and cice6 grids somehow.

And the other question then is are we checking just the input grids or also the output files?

@aekiss
Copy link
Contributor Author

aekiss commented Feb 22, 2024

Checking inputs is more important.

Since they should be identical at double precision (unlike our current 1deg grids), maybe our tolerance should be (say) 1e-7° instead of 1e-4°. However, this would break our current 1° config so we may need to wait until we update our grids before applying this at 1°.

@ezhilsabareesh8
Copy link
Contributor

ezhilsabareesh8 commented Feb 26, 2024

Thanks @anton-seaice and @aekiss. Resolved longitude mismatch error by generating the CICE grid from the MOM super grid. U points are now positioned in the NE corner.

Updated CICE grid:

ULON <xarray.DataArray 'x' (nxp: 1440)
array([-279.75, -279.5 , -279.25, ...,  79.5 ,  79.75,  80. ])
TLON <xarray.DataArray 'x' (nxp: 1440)
array([-279.875, -279.625, -279.375, ...,  79.375,  79.625,  79.875])

The script written to generate the CICE grid from the MOM super grid has been uploaded to om3-scripts. You can review the script here.

@aekiss
Copy link
Contributor Author

aekiss commented Feb 26, 2024

Thanks @ezhilsabareesh8, can you make a PR for this? I have some suggestions that would be easier to make that way.

@ezhilsabareesh8
Copy link
Contributor

ezhilsabareesh8 commented Feb 26, 2024

Thanks @aekiss. I have created a PR in om3-scripts here.

@ezhilsabareesh8
Copy link
Contributor

ezhilsabareesh8 commented Feb 28, 2024

To proceed, I decided to suppress the check, allowing the code to run. However, I've encountered a new problem in the mediator. NaNs are being detected in the calculation of Sa_dens, where Sa_dens is calculated here based on datm pressure, temperature, and humidity. NaNs are occurring specifically in the fields exported to ice (refer here) which could be due to lon mismatch

Thanks to @dougiesquire's suggestion, I resolved the NaN issue by updating datm_in, drof_in, and nuopc.runconfig files with a nomask ESMF mesh file for atmosphere and runoff components. Subsequently, I was able to run the quarter-degree configuration for 5 steps without encountering any crashes by using the CICE grid.nc file generated from mom super grid.

@aekiss
Copy link
Contributor Author

aekiss commented Feb 28, 2024

Awesome!

@aekiss
Copy link
Contributor Author

aekiss commented Mar 15, 2024

Has it been possible to run the 0.25° config for longer than 5 steps?

@ezhilsabareesh8
Copy link
Contributor

ezhilsabareesh8 commented Mar 26, 2024

The configuration fails after 45 days, encountering an error related to bad departure points in CICE. The error message displayed is:

(abort_ice)ABORTED:
(abort_ice) error = (diagnostic_abort)ERROR: bad departure points
ERROR: (abort_ice)(diagnostic_abort)ERROR: bad departure points

The departure points are going out of bounds, specifically more than one grid cell away. Here is the relevant code:

         dpx(i,j) = -dt*uvel(i,j)
         dpy(i,j) = -dt*vvel(i,j)

         ! Check for values out of bounds (more than one grid cell away)
         if (dpx(i,j) < -HTN(i,j) .or. dpx(i,j) > HTN(i+1,j) .or.   &
             dpy(i,j) < -HTE(i,j) .or. dpy(i,j) > HTE(i,j+1)) then
            l_stop = .true.
            istop = i
            jstop = j
         endif

I think it is due to CFL condition violation and I attempted changing the coupler time step (I believe it is same as the thermodynamic time step of CICE) from 3600 seconds to 1800 seconds. However, I continue to face the same problem, any thoughts @aekiss.

I have updated the quarter degree configuration in this branch, and I believe the restart file issue is related to Payu providing incorrect restart file names (refer to this issue). I suspect it could be related to the modules I am loading. It would be of great help if @minghangli-uni or @anton-seaice could check the quarter-degree configuration and see if they can reproduce the restart file issue.

@minghangli-uni
Copy link
Contributor

My run survived until 94 days with dt=900s and dttherm=3600s. Like @ezhilsabareesh8, the reason for failure was bad departure points.

Instead of the MOM_input parameters provided in 1deg_jra55do_ryf, I have updated another configuration using ePBL instead of KPP for the surface boundary layer parameterisation. Will upload it soon.

For the restart file issues, I dont have any for 1deg but do for 0.25deg.

WARNING from PE     0: MOM_restart: Unable to find restart file : ./GMOM_JRA.mom6.r.1900-02-01-00000_1.nc.nc


WARNING from PE     0: MOM_restart: Unable to find restart file : ./GMOM_JRA.mom6.r.1900-02-01-00000_2.nc.nc


FATAL from PE    33: MOM_restart: Unable to find mandatory variable age in restart files.

@aekiss
Copy link
Contributor Author

aekiss commented Mar 26, 2024

Yes, "bad departure points" is a CFL error, so you may need to reduce the timestep further. Specifically, you need to reduce the dynamic timestep, which is a fixed fraction of the thermo timestep. The parameter controlling the ratio is ndtd, so increasing this is another way to avoid CFL problems without having an excessively short thermo timestep.
https://cice-consortium-cice.readthedocs.io/en/main/user_guide/ug_implementation.html#choosing-an-appropriate-time-step

@aekiss
Copy link
Contributor Author

aekiss commented Mar 26, 2024

ACCESS-OM2-025 was able to run consistently with dt=1350 for both cice thermo and dynamics (see timestep column here) so I'm surprised this isn't more stable.

What's the cice initial condition? From the input files #124 (comment) it seems it's not starting from an access-om2 initial condition at 0.25deg like we did at 1deg? see #50

@minghangli-uni
Copy link
Contributor

We are currently testing the default option.

ice_ic = "default"

Will conduct the test in a manner similar to 1deg.

@minghangli-uni
Copy link
Contributor

minghangli-uni commented Mar 26, 2024

I have updated the quarter degree configuration in this branch, and I believe the restart file issue is related to Payu providing incorrect restart file names (refer to this issue). I suspect it could be related to the modules I am loading. It would be of great help if @minghangli-uni or @anton-seaice could check the quarter-degree configuration and see if they can reproduce the restart file issue.

@ezhilsabareesh8 . I believe in your configuration, ocean_vgrid.nc is not correct. nk should be 50.

The vgrid conversion process should follow a similar approach to the following step:

ncks -O -d nzv,,,2 /g/data/ik11/inputs/access-om2/input_20201102/mom_1deg/ocean_vgrid.nc ocean_vgrid.nc

@aekiss any thoughts?

A reminder of our last namelist discussion, we aimed to set nk=75 but currently use nk=50 for the timebeing.

nk set to 75 at both 1° and 0.25° to match ACCESS-OM2-01 - use KDS75 z* - test hybrid adaptive coords later; initial target is something as robust as we can, eg for CMIP7

@ezhilsabareesh8
Copy link
Contributor

ACCESS-OM2-025 was able to run consistently with dt=1350 for both cice thermo and dynamics (see timestep column here) so I'm surprised this isn't more stable.

I am running an experiment with dt=1350 as a coupler time step, which I believe same as the thermo and dynamic time step for CICE refer here so far there is no issue, completed one month.

I tried initial runs with ice_ic = "default" for a cold start, will generate a input condition file from OM2 025 deg run shortly

@ezhilsabareesh8
Copy link
Contributor

ezhilsabareesh8 commented Mar 26, 2024

I believe in your configuration, ocean_vgrid.nc is not correct. nk should be 50.

It is same as OM2 025 deg configuration's ocean_vgrid.nc

/g/data/ik11/inputs/access-om2/input_20230515_025deg_topog/mom_025deg/ocean_vgrid.nc 
netcdf ocean_vgrid {
dimensions:
	nzv = 101 ;
variables:
	double zeta(nzv) ;
		zeta:units = "meters" ;
		zeta:standard_name = "vertical_grid_vertex" ;
		zeta:long_name = "vgrid" ;
		zeta:author = "Kial Stewart" ;

// global attributes:
		:history = " | Updated on Mon Nov  2 16:01:19 AEDT 2020 using https://github.com/COSIMA/make_025deg_topo/tree/f262fbf" ;
}

@dougiesquire
Copy link
Collaborator

What does the ice output look like? See #50 (comment)

@aekiss
Copy link
Contributor Author

aekiss commented Mar 26, 2024

I believe in your configuration, ocean_vgrid.nc is not correct. nk should be 50.

It is same as OM2 025 deg configuration's ocean_vgrid.nc

@minghangli-uni note that ocean_vgrid.nc contains cell centres and cell edges interleaved into one array, which is why it has 101 elements rather than 50.

When we make the switch to 75 levels for 1° and 0.25° we should just re-use ocean_vgrid.nc from OM2-01.

@dougiesquire
Copy link
Collaborator

I believe the restart file issue is related to Payu providing incorrect restart file names (refer to this issue)

Note this should be fixed by payu-org/payu#432

@ezhilsabareesh8
Copy link
Contributor

Thank you all for your inputs and suggestions. I have fixed the bad departure points error by changing the time step to dt = 1350, similar to 025 deg OM2 config and I haven't changed the ndtd ratio since it may lead to truncation error.

I am able to run this configuration for two years without encountering any errors, please refer to this PR.

@dougiesquire
Copy link
Collaborator

@minghangli-uni note that ocean_vgrid.nc contains cell centres and cell edges interleaved into one array, which is why it has 101 elements rather than 50.

@aekiss, this is true for MOM5, but not for our MOM6 configs. @minghangli-uni is correct that NK needs to be set to 50. The confusion arises here because the ocean_vgrid.nc that is used for the OM3 1deg configuration is not just directly copied from the OM2 1deg inputs. If you look in the history attribute of /g/data/ik11/inputs/access-om3/0.x.0/1deg/mom/ocean_vgrid.nc you'll see that the OM3 ocean_vgrid.nc was actually created by downsampling the OM2 ocean_vgrid.nc from 101 vertices to 51 interfaces:

:history = "Fri Jul 28 09:29:32 2023: ncks -O -d nzv,,,2 /g/data/ik11/inputs/access-om2/input_20201102/mom_1deg/ocean_vgrid.nc ocean_vgrid.nc\n | Updated on Thu Oct 15 17:49:39 AEDT 2020 using https://github.com/COSIMA/make_1deg_topo/tree/db7b546" ;

So the same thing needs to be done for the OM2 0.25deg ocean_vgrid.nc to create the OM3 0.25deg ocean_vgrid.nc file. Then NK should be set to 50 as @minghangli-uni says.

@ezhilsabareesh8
Copy link
Contributor

Thanks @dougiesquire and @minghangli-uni. I have updated 025deg's ocean_vgrid.nc file and also MOM_input parameters to NK=50

netcdf ocean_vgrid {
dimensions:
	nzv = 51 ;
variables:
	double zeta(nzv) ;
		zeta:units = "meters" ;
		zeta:standard_name = "vertical_grid_vertex" ;
		zeta:long_name = "vgrid" ;
		zeta:author = "Kial Stewart" ;

// global attributes:
		:history = "Thu Apr  4 17:21:35 2024: ncks -O -d nzv,,,2 /g/data/ik11/inputs/access-om2/input_20230515_025deg_topog/mom_025deg/ocean_vgrid.nc ocean_vgrid.nc\n | Updated on Mon Nov  2 16:01:19 AEDT 2020 using https://github.com/COSIMA/make_025deg_topo/tree/f262fbf" ;
		:NCO = "netCDF Operators version 5.1.4 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)" ;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

6 participants