Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snow burial fix #1112

Merged
merged 11 commits into from
Nov 7, 2020
Merged

Snow burial fix #1112

merged 11 commits into from
Nov 7, 2020

Conversation

danicalombardozzi
Copy link
Contributor

@danicalombardozzi danicalombardozzi commented Aug 19, 2020

Description of changes

Grass/crop snow burial fraction fix

Specific notes

Updates grass and crop snow burial ('fb' in SatellitePhenologyMod.F90) from being 0.2m to change with PFT height and accounts for 20% lodging (falling over)

Contributors other than yourself, if any: David Lawrence, Bill Sacks

CTSM Issues Fixed (include github issue #):
Resolves #516
Resolves #756
Resolves #1116

Are answers expected to change (and if so in what way)?
Yes. Snow burial of all grass and crop PFTs will change, altering albedo and other related water and energy fluxes

Any User Interface Changes (namelist or namelist defaults changes)? No

Testing performed, if any:
(List what testing you did to show your changes worked as expected)
(This can be manual testing or running of the different test suites)
(Documentation on system testing is here: https://github.com/ESCOMP/ctsm/wiki/System-Testing-Guide)
(aux_clm on cheyenne for intel/gnu and izumi for intel/gnu/nag/pgi is the standard for tags on master)
This modification was used in: Lombardozzi, D. L., G. B. Bonan, W. Wieder, A. S. Grandy, C. Morris, and D. L. Lawrence (2018), Cover Crops May Cause Winter Warming in Snow-Covered Regions, Geophysical Research Letters, 45(18), 9889–9897

NOTE: Be sure to check your coding style against the standard
(https://github.com/ESCOMP/ctsm/wiki/CTSM-coding-guidelines) and review
the list of common problems to watch out for
(https://github.com/ESCOMP/CTSM/wiki/List-of-common-problems).

@billsacks billsacks added PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Aug 19, 2020
@billsacks
Copy link
Member

billsacks commented Aug 20, 2020

  • @danicalombardozzi - @dlawrenncar thinks a similar change should also be in CNVegStructUpdateMod (near the end of CNVegStructUpdate). Do you agree? If so, can you please make the similar change there?

@billsacks billsacks self-assigned this Aug 20, 2020
@billsacks
Copy link
Member

billsacks commented Aug 20, 2020

  • Also, @ekluzek suggests putting a comment in both places pointing out that this block of code is duplicated in these two places and should be kept in sync.

  • And @dlawrenncar asks for a comment describing where the 0.8 factor comes from: this is a 20% "bending factor" (that grasses can be bent a bit by the snow, or whatever the rationale is).

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Aug 20, 2020
@billsacks
Copy link
Member

billsacks commented Aug 20, 2020

Also, maybe slightly unrelated to this PR, but I'm wondering if you (@danicalombardozzi ) or others (@dlawrenncar @swensosc , @wwieder , etc.) know whether this difference is intentional or unintentional:

For Satellite Phenology, I see:

elai(p) = max(tlai(p)*(1.0_r8 - frac_sno(c)) + tlai(p)*fb*frac_sno(c), 0.0_r8)
esai(p) = max(tsai(p)*(1.0_r8 - frac_sno(c)) + tsai(p)*fb*frac_sno(c), 0.0_r8)

whereas for BGC, there is no accounting for frac_sno in the similar lines:

elai(p) = max(tlai(p)*fb, 0.0_r8)
esai(p) = max(tsai(p)*fb, 0.0_r8)

  • Is this a bug? Should frac_sno appear in CNVegStructUpdate?

@dlawrenncar
Copy link
Contributor

That looks like a bug to me. frac_sno should be used in both places.

@billsacks
Copy link
Member

I'm comfortable folding in this other bug fix (if others agree that is indeed a bug) with this branch, if others are comfortable with that.

Any feelings on whether there should be a simulation looking at the scientific impacts of (a) the snow burial fix for BGC and/or (b) the apparent bug with frac_sno in elai and esai – either together or separately? Or do people feel that a dedicated simulation for this is unnecessary and we'll just look at the impact of these changes together with some other answer changes?

@wwieder
Copy link
Contributor

wwieder commented Aug 20, 2020

@danicalombardozzi how big was the effect on of this bugfix on your SP runs? If it's large, it may be a good idea to test the effects in a historical simulation before just folding it into the model? If the SP effect is small, maybe the BGC tests aren't necessary?

This would matter for the arctic, however, making me think a BGC test is needed?

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Aug 20, 2020 via email

@danicalombardozzi
Copy link
Contributor Author

I will plan to add the bug in the BGC code into this tag. Folding the fixes into the CLM5PPE tag seems reasonable. I can also run simulations with and without the fix. I will add a comment about the 20% lodging rate, although it was an assumption I made for my 2018 GRL cover crop paper and is not based on data (I couldn't find available data).

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It looks like you have found the problem from before - great! I noticed a couple of things that I point out in line comments. Let me know when you feel this is ready to go, and I can do final testing.

Comment on lines 417 to 418
!grass and crop snow burial changes with PFT height
!accounts for a 20% bending factor, as used in Lombardozzi et al. (2018) GRL 45(18), 9889-9897
Copy link
Member

@billsacks billsacks Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think this comment should be a few lines up, where you have changed fb (line 413)

Comment on lines 296 to 297
!grass and crop snow burial changes with PFT height
!accounts for a 20% bending factor, as used in Lombardozzi et al. (2018) GRL 45(18), 9889-9897
Copy link
Member

@billsacks billsacks Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • As in SatellitePhenologyMod, I think this comment belongs a few lines up, with the fb setting

  • More importantly, it looks like you haven't fixed fb here as you did for the SatellitePhenology code. (Maybe you just haven't done that step yet.)

Copy link
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@danicalombardozzi
Copy link
Contributor Author

I ran two 5-year simulations using an I2000 BGC compset with no spin up. The simulations tested 1) the quadratic bug fix #756 and 2) the snow burial bug fix for BGC #516 (which also includes the quadratic bug fix). Both simulations change answers. The quadratic bug fix answer changes are small in magnitude, whereas the snow burial bug fixes cause larger changes.

Analysis with plots illustrating the changes are here:
https://github.com/danicalombardozzi/ctsm_py/blob/7543b0f3b413bae9974c11b467fdbc0413c3b7fa/notebooks/SnowBurial.ipynb

@billsacks
Copy link
Member

billsacks commented Nov 5, 2020

@danicalombardozzi @olyson Unfortunately, in my testing, I'm getting a lot of runtime failures, especially when compiling with DEBUG=TRUE. My best guess is that they're coming from the fix for #756 that was folded into this branch.

I spot-checked two failures; these fail in different places but it's possible that the underlying cause is the same:

  1. /glade/scratch/sacks/tests_1104-171057ch/ERP_P36x2_D_Ld5.f10_f10_musgs.I2000Clm50Cn.cheyenne_gnu.clm-default.GC.1104-171057ch_gnu/run
1:Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
1:
1:Backtrace for this error:
1:#0  0x2b9c1ac04aff in ???
1:#1  0x2b9c1965a41a in ???
1:#2  0x2b9c19620fb2 in ???
1:#3  0x931775 in __photosynthesismod_MOD_photosynthesishydraulicstress
1:      at /glade/work/sacks/ctsm_code/current_branch1/src/biogeophys/PhotosynthesisMod.F90:3117
1:#4  0x74f50a in __canopyfluxesmod_MOD_canopyfluxes
1:      at /glade/work/sacks/ctsm_code/current_branch1/src/biogeophys/CanopyFluxesMod.F90:895
1:#5  0x519fe8 in __clm_driver_MOD_clm_drv._omp_fn.5
1:      at /glade/work/sacks/ctsm_code/current_branch1/src/main/clm_driver.F90:683
1:#6  0x2b9c1b65c3e5 in gomp_thread_start
1:      at ../../../libgomp/team.c:123
1:#7  0x2b9c18f6d1ac in ???
1:#8  0x2b9c1abfc733 in ???
1:#9  0x2b9c1bdbfd3c in ???
1:#10  0xffffffffffffffff in ???
-1:MPT ERROR: MPI_COMM_WORLD rank 1 has terminated without calling MPI_Finalize()
-1:     aborting job
MPT: Received signal 8
  1. /glade/scratch/sacks/tests_1104-171057ch/SMS_D_Ld3.f10_f10_musgs.I2000Clm50BgcCru.cheyenne_intel.clm-default.GC.1104-171057ch_int/run
53:MPT ERROR: Rank 53(g:53) received signal SIGFPE(8).
53:     Process ID: 64260, Host: r12i2n17, Program: /glade/scratch/sacks/tests_1104-171057ch/SMS_D_Ld3.f10_f10_musgs.I2000Clm50BgcCru.cheyenne_intel.clm-default.GC.1104-171057ch_int/bld/cesm.exe
53:     MPT Version: HPE MPT 2.21  11/28/19 04:21:40
53:
53:MPT: --------stack traceback-------
53:MPT: Attaching to program: /proc/64260/exe, process 64260
53:MPT: BFD: warning: /glade/u/apps/opt/intel/2019u5/mkl/lib/intel64/libmkl_core.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0000002
53:MPT: done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=4f3d05f200db29c6835a48e466e0378a8541fd36"
53:MPT: (no debugging symbols found)...done.
53:MPT: [Thread debugging using libthread_db enabled]
53:MPT: Using host libthread_db library "/glade/u/apps/ch/os/lib64/libthread_db.so.1".
53:MPT: Try: zypper install -C "debuginfo(build-id)=4e96cf37d52b9c2f3648e691878b682da5abfa42"
53:MPT: (no debugging symbols found)...done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=5eb2f40ad3b0125943aba8f08dd08609351a2967"
53:MPT: (no debugging symbols found)...done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=b115bb26e97505a5bd3b56d70d20459aa1206ac9"
53:MPT: (no debugging symbols found)...done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=93c4deac1088eb84fbd01cf2a2c54399f516e9a7"
53:MPT: (no debugging symbols found)...done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=5f9ec139af58fa59c33f72d1b3e56f083f1613ae"
53:MPT: (no debugging symbols found)...done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=bc347d1c2dd56b51057fbac71e84906135d02da5"
53:MPT: (no debugging symbols found)...done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=ad6d98e9e3042e77c051f1f27e971460bf1e428b"
53:MPT: (no debugging symbols found)...done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=4c08f43bb18e99a7df4bad5c4a52bac67ddf9b8d"
53:MPT: (no debugging symbols found)...done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=3ae04b58bd81ea7745dba789d89937e719309568"
53:MPT: (no debugging symbols found)...done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=76d8c012144838e4e5466798858289fba2057537"
53:MPT: (no debugging symbols found)...done.
53:MPT: Try: zypper install -C "debuginfo(build-id)=ba8002518966160a27c335e04ce8932989f69056"
53:MPT: (no debugging symbols found)...done.
53:MPT: 0x00002b4672b136da in waitpid () from /glade/u/apps/ch/os/lib64/libpthread.so.0
53:MPT: Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-49.16.x86_64
53:MPT: (gdb) #0  0x00002b4672b136da in waitpid ()
53:MPT:    from /glade/u/apps/ch/os/lib64/libpthread.so.0
53:MPT: #1  0x00002b4673967866 in mpi_sgi_system (
53:MPT: #2  MPI_SGI_stacktraceback (
53:MPT:     header=header@entry=0x7ffcd27ab500 "MPT ERROR: Rank 53(g:53) received signal SIGFPE(8).\n\tProcess ID: 64260, Host: r12i2n17, Program: /glade/scratch/sacks/tests_1104-171057ch/SMS_D_Ld3.f10_f10_musgs.I2000Clm50BgcCru.cheyenne_intel.clm-de"...) at sig.c:340
53:MPT: #3  0x00002b4673967a62 in first_arriver_handler (signo=signo@entry=8,
53:MPT:     stack_trace_sem=stack_trace_sem@entry=0x2b467df80080) at sig.c:489
53:MPT: #4  0x00002b4673967dfb in slave_sig_handler (signo=8, siginfo=<optimized out>,
53:MPT:     extra=<optimized out>) at sig.c:565
53:MPT: #5  <signal handler called>
53:MPT: #6  0x0000000004145e49 in __libm_pow_e7 ()
53:MPT: #7  0x0000000002c07287 in vocemissionmod::get_gamma_c (
53:MPT:     cisun_in=19.987635923957399, cisha_in=5334.8093739977039,
53:MPT:     forc_pbot_in=90254.973958333343, fsun_in=1.0149571630422467, co2_ppmv=367)
53:MPT:     at /glade/work/sacks/ctsm_code/current_branch1/src/biogeochem/VOCEmissionMod.F90:1085
53:MPT: #8  0x0000000002bfd715 in vocemissionmod::vocemission (bounds=...,
53:MPT:     num_soilp=31, filter_soilp=..., atm2lnd_inst=..., canopystate_inst=...,
53:MPT:     photosyns_inst=..., temperature_inst=..., vocemis_inst=...)
53:MPT:     at /glade/work/sacks/ctsm_code/current_branch1/src/biogeochem/VOCEmissionMod.F90:605
53:MPT: #9  0x00000000008d6f06 in clm_driver::clm_drv (doalb=4294967295,
53:MPT:     nextsw_cday=1.0625, declinp1=-0.40293095207692015,
53:MPT:     declin=-0.40293095207692015, rstwr=.FALSE., nlend=.FALSE., rdate=...,
53:MPT:     rof_prognostic=4294967295, .tmp.RDATE.len_V$2244e=32)
53:MPT:     at /glade/work/sacks/ctsm_code/current_branch1/src/main/clm_driver.F90:773
53:MPT: #10 0x000000000088921b in lnd_comp_mct::lnd_run_mct (eclock=..., cdata_l=...,
53:MPT:     x2l_l=..., l2x_l=...)
53:MPT:     at /glade/work/sacks/ctsm_code/current_branch1/src/cpl/mct/lnd_comp_mct.F90:457
53:MPT: #11 0x0000000000467c55 in component_mod::component_run (eclock=..., comp=...,
53:MPT:     infodata=..., seq_flds_x2c_fluxes=..., seq_flds_c2x_fluxes=...,
53:MPT:     comp_prognostic=4294967295, comp_num=2, timer_barrier=...,
53:MPT:     timer_comp_run=..., run_barriers=.FALSE., ymd=20000101, tod=5400,
53:MPT:     comp_layout=..., .tmp.SEQ_FLDS_X2C_FLUXES.len_V$31ac=4096,
53:MPT:     .tmp.SEQ_FLDS_C2X_FLUXES.len_V$31af=4096,
53:MPT:     .tmp.TIMER_BARRIER.len_V$31b4=19, .tmp.TIMER_COMP_RUN.len_V$31b7=11,
53:MPT:     .tmp.COMP_LAYOUT.len_V$31bd=32)
53:MPT:     at /glade/work/sacks/ctsm_code/current_branch1/cime/src/drivers/mct/main/component_mod.F90:737
53:MPT: #12 0x000000000042f481 in cime_comp_mod::cime_run ()
53:MPT:     at /glade/work/sacks/ctsm_code/current_branch1/cime/src/drivers/mct/main/cime_comp_mod.F90:2626
53:MPT: #13 0x000000000044f79c in cime_driver ()
53:MPT:     at /glade/work/sacks/ctsm_code/current_branch1/cime/src/drivers/mct/main/cime_driver.F90:133
53:MPT: #14 0x0000000000407be2 in main ()
53:MPT: #15 0x00002b4673c796e5 in __libc_start_main ()
53:MPT:    from /glade/u/apps/ch/os/lib64/libc.so.6
53:MPT: #16 0x0000000000407ae9 in _start () at ../sysdeps/x86_64/start.S:118
53:MPT: (gdb) A debugging session is active.
53:MPT:
53:MPT:         Inferior 1 [process 64260] will be detached.
53:MPT:
53:MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
53:MPT: Detaching from program: /proc/64260/exe, process 64260
53:
53:MPT: -----stack traceback ends-----

If possible, it would be great if, once those issues are thought to be resolved, someone could run at least the two tests given as examples above:

./create_test ERP_P36x2_D_Ld5.f10_f10_musgs.I2000Clm50Cn.cheyenne_gnu.clm-default
./create_test SMS_D_Ld3.f10_f10_musgs.I2000Clm50BgcCru.cheyenne_intel.clm-default

to verify that they pass. Ideally, also running the full test suite on izumi would be a good addition before handing this back over for full testing: From the top level of the CTSM checkout on izumi, ./run_sys_tests -s aux_clm --skip-compare --skip-generate. See https://github.com/ESCOMP/CTSM/wiki/System-Testing-Guide for guidance on system testing.

@olyson
Copy link
Contributor

olyson commented Nov 5, 2020

I'll take a look at this.

@olyson
Copy link
Contributor

olyson commented Nov 5, 2020

I backed out the changes from the quadratic bug fix and the ERP_P36x2_D_Ld5.f10_f10_musgs.I2000Clm50Cn.cheyenne_gnu.clm-default test fails in the same place. Seems to be due to t_veg increasing over the course of the simulation from something reasonable to over 850K which results in an overflow due to the exponential.

@olyson
Copy link
Contributor

olyson commented Nov 5, 2020

The problems appear to be associated with the replacement of these statements:

     elai(p) = max(tlai(p)*fb, 0.0_r8)
     esai(p) = max(tsai(p)*fb, 0.0_r8)

with these statements:

     elai(p) = max(tlai(p)*(1.0_r8 - frac_sno(c)) + tlai(p)*fb*frac_sno(c), 0.0_r8)
     esai(p) = max(tsai(p)*(1.0_r8 - frac_sno(c)) + tsai(p)*fb*frac_sno(c), 0.0_r8)

In particular the use of frac_sno here. The equation itself is valid, however, the frac_sno hyperbolic tangent formulation can result in values just slightly less than one despite a heavy snowcover, resulting in very small elai and esai (e.g., 3.1086060491376643E-020 and 2.2204460492503131E-016) that presumably are causing problems in the radiation code (e.g., fsun > 1 used in VOCEmissions and/or very high vegetation temperatures causing a problem with the exponential in Photosynthesis).
The old expression depended solely on fb, which goes to 1 quickly.

The new equations are also present in SatellitePhenology, however, there is a restriction on elai there such that elai < 0.05 and esai < 0.05 are set to zero.

At least this is my working hypothesis so far.

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Nov 5, 2020 via email

@olyson
Copy link
Contributor

olyson commented Nov 5, 2020

I can test some thresholds. I think we would want the threshold to be as small as possible to avoid any possible killing off of vegetation that is trying to grow in BGC/CN mode, while obviously avoiding problems across the varied configurations we have in the model.

@wwieder
Copy link
Contributor

wwieder commented Nov 5, 2020 via email

@billsacks
Copy link
Member

Nice work, Keith!

A thought that comes to mind: rather than trying to find an elai/esai threshold that walks the fine line between letting vegetation grow while not blowing things up, would it instead make sense to have some limit on frac_sno here? i.e., rather than using frac_sno directly in the new equations, instead set:

real(r8), parameter :: frac_sno_threshold = 0.999_r8  ! frac_sno values greater than this are treated as 1

if (frac_sno(c) <= frac_sno_threshold) then
  frac_sno_adjusted = frac_sno(c)
else
  ! avoid tiny but non-zero elai and esai that can cause code to blow up
  frac_sno_adjusted = 1._r8
end if

Then use frac_sno_adjusted in place of frac_sno in the following lines.

The advantage I see there is that this lets elai and esai be just as small as before in the absence of snow cover, and only changes things when there is a big snow pack - so I'd hope this avoids the issue you're worried about of stopping vegetation from growing.

@olyson
Copy link
Contributor

olyson commented Nov 5, 2020

Yes, I think I like your approach better. I think Will was alluding to this as well in terms of what frac_sno actually means in this context.
I think we should let Danica weigh in on this as well since it's her parameterization.

@olyson
Copy link
Contributor

olyson commented Nov 6, 2020

I've implemented Bill's approach and these two tests now pass:

./create_test ERP_P36x2_D_Ld5.f10_f10_musgs.I2000Clm50Cn.cheyenne_gnu.clm-default
./create_test SMS_D_Ld3.f10_f10_musgs.I2000Clm50BgcCru.cheyenne_intel.clm-default

I'm happy to try the izumi testing unless Danica would like to do that.

@danicalombardozzi
Copy link
Contributor Author

danicalombardozzi commented Nov 6, 2020 via email

@olyson
Copy link
Contributor

olyson commented Nov 6, 2020

Looks like the testing on izumi has passed:

./cs.status.fails
1106-135044iz_gnu: 4 tests

1106-135044iz_int: 6 tests

1106-135044iz_nag: 35 tests

1106-135044iz_pgi: 2 tests

========================================================================
Non-PASS results for select phases:
TPUTCOMP non-passes: 0
MEMCOMP non-passes: 0

@billsacks
Copy link
Member

billsacks commented Nov 7, 2020

Thanks, @olyson ! I will run cheyenne testing and let people know how it goes.

  • Before I merge, though, I'd like to make sure that there is general consensus in this solution, from at least some of @danicalombardozzi @wwieder and @dlawrenncar . In particular, @danicalombardozzi and @wwieder asked if we should try to keep things consistent between the SP and BGC code. @olyson pointed out why we can't just use the current SP thresholds in the BGC code, but there's still the question of whether we should change the SP thresholds to match what is now in the BGC code. I'll say: my personal feeling is that this may be a rabbit hole that will require some historical digging to figure out why the SP thresholds were set the way they are, and may be best deferred to a later tag. But I'd like to hear thoughts from at least one or two others before I merge this.

@wwieder
Copy link
Contributor

wwieder commented Nov 7, 2020

Unless someone has boundless time and energy to test changing the SP parameterization to match BGC, I don't think this is necessarily critical to address at this point. @billsacks I'd say go ahead and merge.

@danicalombardozzi
Copy link
Contributor Author

danicalombardozzi commented Nov 7, 2020 via email

@billsacks
Copy link
Member

Thanks @wwieder and @danicalombardozzi . All tests pass now (thanks @olyson for your help getting there!); I will merge to master shortly.

@billsacks billsacks merged commit dddafcd into ESCOMP:master Nov 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR status: ready PR: this is ready to merge in, with all tests satisfactory and reviews complete
Projects
Status: Done (non release/external)
Development

Successfully merging this pull request may close these issues.

elai and esai for non-SP runs should use frac_sno Quadratic solution error crop and grass snow burial
5 participants