Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove duplicate config variables in ocean #4907

Conversation

philipwjones
Copy link
Contributor

@philipwjones philipwjones commented Apr 22, 2022

Removes the auto-generation of duplicate config scalars (varinp_name) that were introduced in a previous commit for safety when adding the get_config_scalar routine.

This is bit for bit except with Intel which is performing a peculiar optimization in the time_integration_split routine with the config_btr_ weights. When these are pointers you get bit-for-bit with current master, but when they're scalars you get roundoff level changes even though the values of the weights are equivalent. I've been able to reproduce this effect on the current master using a local temp scalar instead of the pointer to demonstrate it's not a bug being introduced.

Note: the non-BFB behavior above was for mpas-o standalone, which uses a different optimization level for intel. Testing in E3SM shows BFB results with current master.

It was bit for bit on other compilers like nvhpc on Summit.

[BFB]

@philipwjones
Copy link
Contributor Author

This should not impact other MPAS components and I did build sea-ice successfully, but haven't run any tests with the other components to verify.

@jonbob
Copy link
Contributor

jonbob commented May 2, 2022

@philipwjones - who can review this PR? I requested @mark-petersen but please let me know if you think anyone else should look at it

Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested nightly suite on badger with gnu and intel debug. BFB with master using gnu. Thanks. The normal config_* variables are more clear, when possible to not use the varinp_* version.

Copy link
Contributor

@matthewhoffman matthewhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested MALI's full_integration suite on Badger and everything passed.

Copy link
Contributor

@akturner akturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in standalone MPAS-Seaice

@jonbob jonbob added BFB PR leaves answers BFB and removed BFB PR leaves answers BFB labels May 10, 2022
@jonbob
Copy link
Contributor

jonbob commented May 12, 2022

@philipwjones - testing of a merge of this failed in E3SM with the following error messages:

/lcrc/group/e3sm/ac.jwolfe/scratch/chrys/ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel.C.20220512_153721_cqe5zr/bld/cmake-bld/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90(659): error #7496: A non-pointer actual argument shall have a TARGET attribute when associated with a pointer dummy argument.   [CONFIG_MPAS_TO_GRID_WEIGHTS_FILE]
        call mpas_pool_get_config(ocnConfigs, 'config_mpas_to_grid_weights_file', config_mpas_to_grid_weights_file)
----------------------------------------------------------------------------------^
/lcrc/group/e3sm/ac.jwolfe/scratch/chrys/ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel.C.20220512_153721_cqe5zr/bld/cmake-bld/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90(660): error #7496: A non-pointer actual argument shall have a TARGET attribute when associated with a pointer dummy argument.   [CONFIG_GRID_TO_MPAS_WEIGHTS_FILE]
        call mpas_pool_get_config(ocnConfigs, 'config_grid_to_mpas_weights_file', config_grid_to_mpas_weights_file)
----------------------------------------------------------------------------------^
/lcrc/group/e3sm/ac.jwolfe/scratch/chrys/ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel.C.20220512_153721_cqe5zr/bld/cmake-bld/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90(661): error #7496: A non-pointer actual argument shall have a TARGET attribute when associated with a pointer dummy argument.   [CONFIG_SELF_ATTRACTION_LOADING_COMPUTE_INTE]
        call mpas_pool_get_config(ocnConfigs, 'config_self_attraction_loading_compute_interval', config_self_attraction_loading_compute_interval)
-------------------------------------------------------------------------------------------------^
/lcrc/group/e3sm/ac.jwolfe/scratch/chrys/ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel.C.20220512_153721_cqe5zr/bld/cmake-bld/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90(664): error #7496: A non-pointer actual argument shall have a TARGET attribute when associated with a pointer dummy argument.   [CONFIG_OCEAN_RUN_MODE]
        call mpas_pool_get_config(ocnConfigs, 'config_ocean_run_mode', config_ocean_run_mode)
-----------------------------------------------------------------------^

  removes the duplicate config scalars (varinp_name) that were
  introduced in a previous commit for safety when adding
  the get_config_scalar routine
@philipwjones
Copy link
Contributor Author

@jonbob - yeah, I guess that was added after I submitted this PR. I'm rebasing and fixing some deprecated stuff in the SAL routines. Should have some updates pretty quick.

…oading

  removed/replaced meshPool and configPool retrievals since info
    available through module use statements
  removed a block loop since multiple blocks no longer supported
@jonbob
Copy link
Contributor

jonbob commented May 13, 2022

Thanks @philipwjones - I figured it had to be something like that

@philipwjones philipwjones force-pushed the philipwjones/mpas-framework/fix-config-dup branch from fc87c1e to e07709c Compare May 13, 2022 15:45
@philipwjones
Copy link
Contributor Author

Ok @jonbob this should work now

@jonbob
Copy link
Contributor

jonbob commented May 18, 2022

@philipwjones - it's not clear to me if this PR should be BFB for our typical E3SM tests. The documentation says it shouldn't be on Intel but it's not clear under what conditions

@philipwjones
Copy link
Contributor Author

@jonbob - the non-bfb results were from any standalone case I ran with Intel (at least on Compy, Chrysalis), though I notice the standalone MPAS build uses -O3 while the E3SM build uses -O2. So it's possible this PR will be bfb in E3SM builds with the lower optimization level.

@jonbob
Copy link
Contributor

jonbob commented May 18, 2022

Thanks @philipwjones - I was curious because the PR didn't have a BFB or non-BFB label. So far my tests on chrysalis have been BFB for:

  • ERS.ne11_oQU240.WCYCL1850NS
  • SMS_D_Ld3.T62_oQU120.CMPASO-IAF
  • SMS.T62_oQU120_ais20.MPAS_LISIO_TEST

and waiting for:

  • SMS_D_Ld1.ne30pg2_r05_EC30to60E2r2.WCYCL1850

@jonbob
Copy link
Contributor

jonbob commented May 18, 2022

Also BFB for SMS_D_Ld1.ne30pg2_r05_EC30to60E2r2.WCYCL1850 using chrysalis_intel

@jonbob
Copy link
Contributor

jonbob commented May 18, 2022

@philipwjones - I thought if you saw non-BFB behavior on compy I'd test there as well. With the default compiler as pgi, an SMS_P12x2.ne4_oQU240.WCYCL1850NS.compy_pgi.allactive-mach_mods test case fails to build with the following error message:

PGF90-S-0155-ncellsowned is use associated and cannot be redeclared (/compyfs/wolf966/e3sm_scratch/SMS_P12x2.ne4_oQU240.WCYCL1850NS.compy_pgi.allactive-mach_mods.C.20220518_140549_51cxhh/bld/cmake-bld/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90: 109)
PGF90-S-0155-ncellsall is use associated and cannot be redeclared (/compyfs/wolf966/e3sm_scratch/SMS_P12x2.ne4_oQU240.WCYCL1850NS.compy_pgi.allactive-mach_mods.C.20220518_140549_51cxhh/bld/cmake-bld/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90: 110)
PGF90-S-0155-latcell is use associated and cannot be redeclared (/compyfs/wolf966/e3sm_scratch/SMS_P12x2.ne4_oQU240.WCYCL1850NS.compy_pgi.allactive-mach_mods.C.20220518_140549_51cxhh/bld/cmake-bld/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90: 111)
PGF90-S-0155-loncell is use associated and cannot be redeclared (/compyfs/wolf966/e3sm_scratch/SMS_P12x2.ne4_oQU240.WCYCL1850NS.compy_pgi.allactive-mach_mods.C.20220518_140549_51cxhh/bld/cmake-bld/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90: 112)
PGF90-S-0155-areacell is use associated and cannot be redeclared (/compyfs/wolf966/e3sm_scratch/SMS_P12x2.ne4_oQU240.WCYCL1850NS.compy_pgi.allactive-mach_mods.C.20220518_140549_51cxhh/bld/cmake-bld/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90: 113)
PGF90-S-0155-distancetocoast is use associated and cannot be redeclared (/compyfs/wolf966/e3sm_scratch/SMS_P12x2.ne4_oQU240.WCYCL1850NS.compy_pgi.allactive-mach_mods.C.20220518_140549_51cxhh/bld/cmake-bld/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90: 114)
  0 inform,   0 warnings,   6 severes, 0 fatal for ocn_vel_self_attraction_loading
Target CMakeFiles/ocn.dir/__/__/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90.o built in 2.806800 seconds
gmake[2]: *** [mpas-framework/src/CMakeFiles/ocn.dir/__/__/core_ocean/shared/mpas_ocn_vel_self_attraction_loading.f90.o] Error 2

@philipwjones
Copy link
Contributor Author

@jonbob - argh, sorry about that. Apparently got sloppy with that last commit and forgot to remove some pointer declarations. Just pushed a fix that builds with Nvidia.

@jonbob
Copy link
Contributor

jonbob commented May 19, 2022

Thanks @philipwjones. It now builds and is BFB on compy, at least for:

  • ERS_Ld5.T62_oQU120.CMPASO-NYF.compy_pgi

I'm waiting for at least one test to run on chrysalis with the newest set of changes and then feel confident to merge as a BFB PR

@jonbob jonbob added the BFB PR leaves answers BFB label May 19, 2022
jonbob added a commit that referenced this pull request May 19, 2022
#4907)

Remove duplicate config variables in ocean

Removes the auto-generation of duplicate config scalars (varinp_name)
that were introduced in a previous commit for safety when adding the
get_config_scalar routine.

[BFB]
@jonbob
Copy link
Contributor

jonbob commented May 19, 2022

merged to next

@jonbob jonbob merged commit 31b474f into E3SM-Project:master May 23, 2022
@jonbob
Copy link
Contributor

jonbob commented May 23, 2022

merged to master

@philipwjones philipwjones deleted the philipwjones/mpas-framework/fix-config-dup branch August 17, 2022 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants