Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OM_05 crashes when using some mask_tables #734

Closed
nikizadehgfdl opened this issue Mar 13, 2018 · 11 comments
Closed

OM_05 crashes when using some mask_tables #734

nikizadehgfdl opened this issue Mar 13, 2018 · 11 comments

Comments

@nikizadehgfdl
Copy link
Contributor

The global area mean diagnostics produce NaNs and cause the model to crash:

FATAL from PE   754: NaN in input field of reproducing_sum(_2d).

Image              PC                Routine            Line        Source
fms_MOM6_SIS2_com  000000000151E5D1  mpp_mod_mp_mpp_er          50  mpp_util_mpi.inc
fms_MOM6_SIS2_com  00000000006FCFF5  mom_coms_mp_repro         157  MOM_coms.F90
fms_MOM6_SIS2_com  00000000007A51E1  mom_spatial_means          39  MOM_spatial_means.F90
fms_MOM6_SIS2_com  0000000000C95F50  mom_diagnostics_m        1200  MOM_diagnostics.F90
fms_MOM6_SIS2_com  00000000007CD441  mom_mp_step_mom_          774  MOM.F90
fms_MOM6_SIS2_com  0000000000729F84  ocean_model_mod_m         581  ocean_model_MOM.F90

Where line 39 of referred to above is:

  global_area_mean = reproducing_sum( tmpForSumming ) * G%IareaT_global

This happens for all diagnostics that use the global_area_mean or global_area_integral subroutine:

ssh_ga, tosga, sosga, zos, zossq, volo

After I comment all these out of the diag_table the model crashes with a non-diagnostic instance of calling reproducing_sum:

FATAL from PE   754: NaN in input field of reproducing_sum(_2d).

fms_MOM6_SIS2_com  000000000151E5F1  mpp_mod_mp_mpp_er          50  mpp_util_mpi.inc
fms_MOM6_SIS2_com  00000000006FCFF5  mom_coms_mp_repro         157  MOM_coms.F90
fms_MOM6_SIS2_com  0000000000858727  mom_surface_forci         578  MOM_surface_forcing.F90
fms_MOM6_SIS2_com  00000000007295D7  ocean_model_mod_m         547  ocean_model_MOM.F90
fms_MOM6_SIS2_com  000000000040B972  MAIN__                   1020  coupler_main.F90

where line 578 of MOM_surface_forcing.F90 is:

      fluxes%netFWGlobalAdj = reproducing_sum(net_FW(:,:), isr, ier, jsr, jer) / CS%area_surf
@nikizadehgfdl
Copy link
Contributor Author

The model runs fine with the same layout and no mask_table.

Not all mask_tables cause crash, only some of them.

One thing that is common to all the doomed quantities is a factor of G%areaT * G%mask2dT.
Could a bad value lurk into either of these arrays when using a mask_table that has a common boundary with the actual land mask?

@nikizadehgfdl
Copy link
Contributor Author

Running in debug mode does not trap any such NaNs ! How does reproducing_sum do it?

The model crashes on day 5. But if I run for 4 days and restart the model it does not crash till day 7!!!

@nikizadehgfdl
Copy link
Contributor Author

I think I might have found the root cause. The generation of mask_table takes the number of halos and assumes it to be 1 if not provided. Is 1 the correct value for 1/2 degree model?

The mask generation tool can also take a --sea_level argument and assumes 0 if not passed. Shouldn't this match the MOM6 parameter MINIMUM_DEPTH = 9.5 ?

@adcroft
Copy link
Collaborator

adcroft commented Mar 14, 2018

What does "number of halos" mean? Do you mean the size of the halo?

And yes, the pre-calculation of the mask by fre tools can be inconsistent with the mask that the ocean dynamically generates. MOM6 does not read a mask. There is another parameter which controls the masking to help make these things more consistent:

  call get_param(PF, mdl, "MINIMUM_DEPTH", min_depth, &
                 "If MASKING_DEPTH is unspecified, then anything shallower than\n"//&
                 "MINIMUM_DEPTH is assumed to be land and all fluxes are masked out.\n"//&
                 "If MASKING_DEPTH is specified, then all depths shallower than\n"//&
                 "MINIMUM_DEPTH but deeper than MASKING_DEPTH are rounded to MINIMUM_DEPTH.", &
                 units="m", default=0.0)
  call get_param(PF, mdl, "MASKING_DEPTH", mask_depth, &
                 "The depth below which to mask points as land points, for which all\n"//&
                 "fluxes are zeroed out. MASKING_DEPTH is ignored if negative.", &
                 units="m", default=-9999.0)

@Zhi-Liang
Copy link
Contributor

Zhi-Liang commented Mar 14, 2018 via email

@nikizadehgfdl
Copy link
Contributor Author

I did some experiments with the --halo argument of the check_mask tool and found that if I provide it with --halo 3 (at the least) then the produced mask_table allows the model to run for the layout that was crashing. What this does is to push the masked pe boundaries away from the land boundaries which we guess is the source of the problem.

Is there a requirement that there should be pes spanning the ocean-land boundary?

@nikizadehgfdl
Copy link
Contributor Author

Zhi, yes, when the model runs with a mask_table it reproduces the answers without a mask_table (except for some spurious values on the land).

@nikizadehgfdl
Copy link
Contributor Author

I put a print statement in global_area_mean, it's the variable itself (ssh in this case) that prints as "NaN", the G%areaT is finite and G%mask2dT is 0 or 1 (so NaN could happen on land or sea).

@nikizadehgfdl
Copy link
Contributor Author

MEKE%Ku first gets a NaN that propagates on. The model runs fine when I turn MEKE off. This could be related to issue #262 !

nikizadehgfdl added a commit to nikizadehgfdl/MOM6 that referenced this issue Mar 23, 2018
- Closes issue mom-ocean#734 and hopefully issue mom-ocean#262
- MEKE%Ku array is allocated on data domain and has a mpp_domain_update
  following the loop so it needs to be calculated only on compute domain
- The problem with larger extents of the loops is Lmixscale array is
  initialized/calculate on compute domain and is NaN beyond compute domain
  extents, causing NaN's to lurk into MEKE%Ku and the model.
@Hallberg-NOAA
Copy link
Collaborator

This issue was addressed by PR NOAA-GFDL#739, and is being closed.

@nikizadehgfdl
Copy link
Contributor Author

Just a note for future reference: Creating mask_tables with --halo 3 (check_mask --halo 3 ...) saved another SPEAR based coupled model from invalid answers with mask_tables. Without --halo 3 some land processors which are too close to land-ocean boundary get masked and cause the model not to reproduce (or even crash) the non-mask-table answers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants