Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace MOM_control_struct pointers as locals #1429

Merged
merged 3 commits into from
Jul 26, 2021

Conversation

marshallward
Copy link
Collaborator

The MOM_control_struct is declared and passed as a pointer to a CS
(usually allocated anonymously) and treated as if it were a pointer,
even though there is currently no real advantage to doing so.

After the FMS update, the deallocation of this CS was causing a
segmentation fault in the PGI compilers. While the underlying cause was
never determined, it is likely due to some automated deallocation of the
CS contents, whose addressing became scrambled.

This problem can be resolved by moving all of the CS contents to stack,
so that the contents are automatically removed upon exiting whatever
function it was instantiated. Subsequent calls can reference the local
(or parent) stack contents.

@marshallward
Copy link
Collaborator Author

This patch will fix the FMS2 PGI runtime issues for most tests. Only OM_1deg needs to be fixed after this PR. All tests pass over all three compilers with FMS1.

This patch requires a change to the config_src drivers so it would be useful to get feedback from the other groups. (@alperaltuntas @gustavo-marques @kshedstrom @jiandewang @sanAkel @abozec)

@marshallward
Copy link
Collaborator Author

The CS was tagged as target in some subroutines, but this is often only needed for the convenience aliases G => CS%G, etc. We may want to look into dropping these aliases where possible.

(Note that some may be required for the rotation testing.)

@sanAkel
Copy link
Collaborator

sanAkel commented Jun 16, 2021

@marshallward I never looked into FMS2! (You may recall our past conversations!)

Will get in touch with you on Friday to figure out how to test. It's certainly good to get a head start before we update to FMS2. Thanks for the heads up!

@marshallward
Copy link
Collaborator Author

Thanks @sanAkel

For you and the others: the only real question is whether there is any issue changing the MOM_control_struct from a pointer to just a regular variable (usually inside a larger derived type).

@kshedstrom
Copy link
Collaborator

Yes, having a problem:

 At line 3476 of file //import/c1/AKWATERS/kate/ESMG/ESMG-configs/src/MOM6/src/framework/MOM_diag_mediator.F90
Fortran runtime error: Attempt to DEALLOCATE unallocated 'diag_cs'

This is in all my Chapman_coast tests.

@sanAkel
Copy link
Collaborator

sanAkel commented Jun 16, 2021

@marshallward looking (at the changes on my phone!) it seems:

  1. I can test regardless of FMS1 or 2.
  2. Changes do not "take away" any of CS%p

So if 1 is true, I should be able to give it a 👍 or 👎 easily without bothering @mathomp4 or @tclune (except via this note!)

@kshedstrom
Copy link
Collaborator

All is well with this change:

diff --git a/src/framework/MOM_diag_mediator.F90 b/src/framework/MOM_diag_mediator.F90
index e068d26f5..fab88cce5 100644
--- a/src/framework/MOM_diag_mediator.F90
+++ b/src/framework/MOM_diag_mediator.F90
@@ -3473,16 +3473,16 @@ subroutine diag_mediator_end(time, diag_CS, end_diag_manager)
     call axes_grp_end(diag_cs%remap_axesCvi(i))
   enddo
 
-  deallocate(diag_cs%remap_axesZL)
-  deallocate(diag_cs%remap_axesZi)
-  deallocate(diag_cs%remap_axesTL)
-  deallocate(diag_cs%remap_axesTi)
-  deallocate(diag_cs%remap_axesBL)
-  deallocate(diag_cs%remap_axesBi)
-  deallocate(diag_cs%remap_axesCuL)
-  deallocate(diag_cs%remap_axesCui)
-  deallocate(diag_cs%remap_axesCvL)
-  deallocate(diag_cs%remap_axesCvi)
+  if (allocated(diag_cs%remap_axesZL)) deallocate(diag_cs%remap_axesZL)
+  if (allocated(diag_cs%remap_axesZi)) deallocate(diag_cs%remap_axesZi)
+  if (allocated(diag_cs%remap_axesTL)) deallocate(diag_cs%remap_axesTL)
+  if (allocated(diag_cs%remap_axesTi)) deallocate(diag_cs%remap_axesTi)
+  if (allocated(diag_cs%remap_axesBL)) deallocate(diag_cs%remap_axesBL)
+  if (allocated(diag_cs%remap_axesBi)) deallocate(diag_cs%remap_axesBi)
+  if (allocated(diag_cs%remap_axesCuL)) deallocate(diag_cs%remap_axesCuL)
+  if (allocated(diag_cs%remap_axesCui)) deallocate(diag_cs%remap_axesCui)
+  if (allocated(diag_cs%remap_axesCvL)) deallocate(diag_cs%remap_axesCvL)
+  if (allocated(diag_cs%remap_axesCvi)) deallocate(diag_cs%remap_axesCvi)
 
   do dl=2,MAX_DSAMP_LEV
     if (allocated(diag_cs%dsamp(dl)%remap_axesTL)) &

@marshallward
Copy link
Collaborator Author

I may have tracked down the underlying problem while working through another issue, which would make this PR unnecessary.

There is probably still value in phasing out these old pointers, but we may be able to delay it if I can work out a solution.

I'll update when I've reached a conclusion.

@kshedstrom
Copy link
Collaborator

FYI, the tests of mine that failed were all barotropic. Do you need a barotropic test?

@marshallward
Copy link
Collaborator Author

Yes, that would help. BTW @kshedstrom I think you might be seeing problems related to the memory cleanup. Can you re-try with dev/gfdl rather than this PR?

@kshedstrom
Copy link
Collaborator

dev/gfdl is no problem either. I'm only testing gfortran.

@jiandewang
Copy link
Collaborator

in UFS we follow GFDL main branch. This time Marshall's PR is aimed for dev/gfdl, so my first work is testing dev/gfdl in UFS. What I found is in dev/gfdl some changes in MOM_wave_interface.F90 is causing compiling failure in UFS. I think the reason is those public properties were being removed at
https://github.com/NOAA-GFDL/MOM6/blob/dev/gfdl/src/user/MOM_wave_interface.F90#L107-L140

they were defined as public in main (see https://github.com/NOAA-GFDL/MOM6/blob/main/src/user/MOM_wave_interface.F90#L75-L114)

this caused the compiling failure in nuopc code at
https://github.com/NOAA-GFDL/MOM6/blob/dev/gfdl/config_src/drivers/nuopc_cap/mom_ocean_model_nuopc.F90#L396

/MOM6/config_src/drivers/nuopc_cap/mom_ocean_model_nuopc.F90(396): error #6292: The parent type of this field is use associated with the PRIVATE fields attribute. [WAVENUM_CEN]
call get_param(param_file,mdl,"SURFBAND_WAVENUMBERS", OS%Waves%WaveNum_Cen, &
-------------------------------------------------------------------^

@marshallward
Copy link
Collaborator Author

@jiandewang Let's deal with that in a separate issue or perhaps in the next dev/gfdl merge. Hopefully it is unrelated to the pointer status of MOM_control_struct.

@marshallward
Copy link
Collaborator Author

@kshedstrom Is there a simple way for me to test your configuration?

@marshallward
Copy link
Collaborator Author

@sanAkel Testing with FMS1 should be fine.

@marshallward
Copy link
Collaborator Author

Unfortunately I don't think I can find a way around the problem without this PR, so I will move forward with it.

@kshedstrom I will add your suggested changes to the PR.

The MOM_control_struct is declared and passed as a pointer to a CS
(usually allocated anonymously) and treated as if it were a pointer,
even though there is currently no real advantage to doing so.

After the FMS update, the deallocation of this CS was causing a
segmentation fault in the PGI compilers.  While the underlying cause was
never determined, it is likely due to some automated deallocation of the
CS contents, whose addressing became scrambled.

This problem can be resolved by moving all of the CS contents to stack,
so that the contents are automatically removed upon exiting whatever
function it was instantiated.  Subsequent calls can reference the local
(or parent) stack contents.
@kshedstrom
Copy link
Collaborator

One example barotropic case is: https://github.com/ESMG/ESMG-configs/tree/dev/esmg/ocean_only/Tidal_bay
It should run without other files.

@jiandewang
Copy link
Collaborator

@jiandewang Let's deal with that in a separate issue or perhaps in the next dev/gfdl merge. Hopefully it is unrelated to the pointer status of MOM_control_struct.

@marshallward agree, I will testing this PR by manually adding those "public" back in the wave code.

@marshallward
Copy link
Collaborator Author

I modified Kate's suggestion to check num_diags > 0 rather than use if(allocated()), but otherwise it should behave the same.

Assuming that this PR and NOAA-GFDL/FMS#764 are accepted, we should be able to run our regression suite with both FMS1 and FMS2.

(I suppose I now "approve" my own PR...?)

@jiandewang
Copy link
Collaborator

@marshallward by adding those "public" back in wave-interface code, your branch works fine in UFS

@adcroft
Copy link
Collaborator

adcroft commented Jun 21, 2021

This patch requires a change to the config_src drivers so it would be useful to get feedback from the other groups.

This is an API change so can @alperaltuntas @gustavo-marques take a look at this PR?

@alperaltuntas
Copy link
Collaborator

Approved. Apart from the issue of private wave members that's already pointed out, this PR works fine with the NCAR configuration.

Copy link
Collaborator

@Hallberg-NOAA Hallberg-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is broad agreement that these changes are correct and work for all of the primary MOM6 development forks. Although the automated pipeline testing is broken at the moment due to some unfortunate updates to the Gaea scheduler, I have manually carried out the pipeline tests, and I can confirm that this PR does not change any answers, and that all output is the same.

@Hallberg-NOAA Hallberg-NOAA merged commit 71b4103 into mom-ocean:dev/gfdl Jul 26, 2021
@marshallward marshallward deleted the mom_cs_to_stack branch October 20, 2021 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants