Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unelegant crash writing initial condition when 'histfreq' is more frequent then any 'f_*' output variable #521

Closed
phil-blain opened this issue Oct 13, 2020 · 18 comments · Fixed by #631

Comments

@phil-blain
Copy link
Member

phil-blain commented Oct 13, 2020

I noticed that if I change histfreq to d or 1 in the namelist, and I don't change nothing else from the defaults (i.e. all f_* output variables are either set to 'm' or 'x'), then the model crashes very inelegantly in accum_hist when called in cice_init to write the initial condition.

Here is the backtrace from the Intel compiler runtime (code compiled with -traceback):

forrtl: severe (408): fort: (8): Attempt to fetch from allocatable variable A2D when it is not allocated
Image              PC                Routine            Line        Source
cice               0000000001F6AE76  Unknown               Unknown  Unknown
cice               0000000000E0DFDA  ice_history_mp_ac        1893  ice_history.F90
cice               000000000040383B  cice_initmod_mp_c         224  CICE_InitMod.F90
cice               0000000000401E83  cice_initmod_mp_c          52  CICE_InitMod.F90
cice               0000000000401671  MAIN__                     43  CICE.F90
cice               00000000004015F2  Unknown               Unknown  Unknown
cice               000000000203A08F  Unknown               Unknown  Unknown
cice               00000000004014DA  Unknown               Unknown  Unknown

The call to accum_hist is here:

if (write_ic) call accum_hist(dt) ! write initial conditions

And this is the line where it crashes:

call accum_hist_field(n_hi, iblk, vice(:,:,iblk), a2D)

It's not super important, but it's a little bit disconcerting. I think we could improve that and abort early, with a message like:

(WARNING): 'histfreq' set to 'd', but no outputs are set for this frequency

or something like that... at least we should not cause a runtime crash like this.

@dabail10
Copy link
Contributor

I'll take responsibility for this one.

@apcraig
Copy link
Contributor

apcraig commented Dec 23, 2020

#548 may address this in part.

@phil-blain phil-blain self-assigned this Jan 8, 2021
@apcraig
Copy link
Contributor

apcraig commented Aug 25, 2021

I am not able to duplicate this error with either the intel or gnu compiler even with debug flags on on the current code. I propose we close this and assume some of the recent fixes addressed the problem. @phil-blain, do you want to run a quick test again to see if you are still seeing the problem with the current master?

@phil-blain
Copy link
Member Author

Yes I'll check again. I've just submitted a quick run.

@phil-blain
Copy link
Member Author

I confirm that I can't reproduce with the steps in the description. Bug fixed itself :P Let's close then.

@eclare108213
Copy link
Contributor

Thanks @phil-blain ! Closing the issue.

@apcraig
Copy link
Contributor

apcraig commented Aug 26, 2021

Thanks @phil-blain for confirming.

@phil-blain
Copy link
Member Author

phil-blain commented Sep 8, 2021

I just had this issue again, relaunching the exact same case as two weeks ago... I guess it depends on the state of the memory on the system...

I could not find a core dump though....

forrtl: severe (408): fort: (8): Attempt to fetch from allocatable variable A2D when it is not allocated

Image              PC                Routine            Line        Source
cice               00000000021D8036  Unknown               Unknown  Unknown
cice               0000000000E5446F  ice_history_mp_ac        1903  ice_history.F90
cice               0000000000403735  cice_initmod_mp_c         239  CICE_InitMod.F90
cice               00000000004019E3  cice_initmod_mp_c          53  CICE_InitMod.F90
cice               0000000000401671  MAIN__                     43  CICE.F90
cice               00000000004015F2  Unknown               Unknown  Unknown
cice               00000000022A890F  Unknown               Unknown  Unknown
cice               00000000004014DA  Unknown               Unknown  Unknown

The call to accum_hist is the same as before, writting the initial condition here:

if (write_ic) call accum_hist(dt) ! write initial conditions

And again it crashed at the same place, the first use of a2D (first call to accum_hist_field):

call accum_hist_field(n_hi, iblk, vice(:,:,iblk), a2D)

@phil-blain phil-blain reopened this Sep 8, 2021
@dabail10
Copy link
Contributor

dabail10 commented Sep 8, 2021

Let me take a look at this again. Just sounds like there needs to be another if condition.

Dave

@phil-blain
Copy link
Member Author

phil-blain commented Sep 8, 2021

I'm guessing it's the logic in define_hist_field here that's at play:

character(len=*), parameter :: subname = '(define_hist_field)'
if (histfreq(ns) == 'x') then
call abort_ice(subname//'ERROR: define_hist_fields has histfreq x')
endif
if (ns == 1) id(:) = 0
lenf = len(trim(vhistfreq))
do ns1 = 1, lenf
if (vhistfreq(ns1:ns1) == histfreq(ns)) then
num_avail_hist_fields_tot = num_avail_hist_fields_tot + 1
if (vcoord(11:14) == 'time') then
num_avail_hist_fields_2D = num_avail_hist_fields_2D + 1

I think the code does not enter that if (vhistfreq(ns1:ns1) == histfreq(ns)) block at all, so num_avail_hist_fields_2D is not incremented.

So when we arrive here in init_hist:

!-----------------------------------------------------------------
! initialize the history arrays
!-----------------------------------------------------------------
if (allocated(a2D)) deallocate(a2D)
if (num_avail_hist_fields_2D > 0) &
allocate(a2D(nx_block,ny_block,num_avail_hist_fields_2D,max_blocks))

the array is not allocated at all...

@dabail10
Copy link
Contributor

dabail10 commented Sep 8, 2021

I think the issue is with the check:

if (f_hi(1:1) /= 'x')

I think this should be f_hi(ns:ns). When I implemented this originally, I was checking to make sure at least the first stream was set. Can you send me your namelist values from ice_in that generate this crash? These should be:

histfreq, histfreq_n, and f_hi

Dave

@phil-blain
Copy link
Member Author

Yes, sure. f_hi and histfreq_n are unchanged from their default values:

    histfreq       = 'd','x','x','x','x'
    histfreq_n     =  1 , 1 , 1 , 1 , 1
    f_hi           = 'm'

@dabail10
Copy link
Contributor

dabail10 commented Sep 8, 2021

I am beginning to see the problem now. The first stream here is daily, but the variable specification is indicating monthly. Because the check only makes sure that the first character of f_hi is not 'x', then it tries to do something. However, a2D is not defined for a monthly stream. I think I can add a more graceful warning message here.

@dabail10
Copy link
Contributor

dabail10 commented Sep 9, 2021

Ok. I have a fix in place and I will submit a PR soon. It is actually more involved than just a warning message. I changed the checks for accumulating the variables to be:

if (f_hi(ns:ns) == histfreq(ns)) then

This makes sure that the character in the string for f_hi matches up with the stream you are averaging on. I also have a check that prints a warning if there are no fields going to the history files.

@dabail10 dabail10 mentioned this issue Sep 9, 2021
16 tasks
@dabail10
Copy link
Contributor

dabail10 commented Sep 9, 2021

So, this strategy was flawed. The accum_hist_field subroutine has a loop internally that goes over all streams. My new fix is to check if a2D, a3Dc, etc. are allocated or not. Then it will skip over the accum_hist_calls. However, I did find something curious that I am looking at. Say the namelist is set as:

histfreq = 'd','x','x','x','x'

f_hi = 'mdxxx'

This still writes hi to the daily stream even though the d is in the second place for f_hi. I feel like this should not work and I am trying to figure out why.

@eclare108213
Copy link
Contributor

That example should work, in my opinion -- if you have a 'd' in both places, you'd expect the variable to be written. If it were writing monthly output, then there's a problem.

Also my opinion: the extra 'x's shouldn't be necessary. The order that the frequencies appear in these lists shouldn't matter, although I know the first one is 'special' in that the file names are formatted differently for the others.

@dabail10
Copy link
Contributor

dabail10 commented Sep 9, 2021

I guess I was concerned that you might not be doing the averaging interval correctly if it is in a different order. However it does look like it does the correct thing. It checks for avail_hist_fields(n)%vhistfreq == histfreq(ns). So, that as long as there is something on one of the streams that matches the vhistfreq for the variable, it should be good to go. I guess I will submit what I have to fix the crash when the a2D, a3Dc, etc. arrays are not allocated. Interestingly, some of the ice_history_*.F90 tracer routines did have this check, but it was not in ice_history.F90.

@apcraig
Copy link
Contributor

apcraig commented Sep 10, 2021

I agree the x should be treated the same as a blank. It also should not matter what order they are for an individual field (this is being tested now). Finally, I think an x in the first slot ('xdmxx') should not lead to skipping the d and m. The order of the streams defined by histfreq should not require any alignment with the active streams specified by the fields. I think that's almost the case now (except for maybe a leading "x").

@apcraig apcraig linked a pull request Sep 10, 2021 that will close this issue
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants