Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SatDiagn problems in gcclassic fullchemistry run (v14.1) #2017

Closed
Kexin828 opened this issue Oct 29, 2023 · 11 comments · Fixed by #2369
Closed

SatDiagn problems in gcclassic fullchemistry run (v14.1) #2017

Kexin828 opened this issue Oct 29, 2023 · 11 comments · Fixed by #2369
Assignees
Labels
category: Bug Something isn't working never stale Never label this issue as stale topic: Diagnostics Related to output diagnostic data
Milestone

Comments

@Kexin828
Copy link

Kexin828 commented Oct 29, 2023

Name:Zhang Kexin
Institution:Tsinghua university

Hi Bob,

In my tests for v14.1 fullchemistry simulation, I meet problems as follow:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
gcclassic 0000000000D0C47A Unknown Unknown Unknown
libpthread-2.17.s 00002B68FACD3630 Unknown Unknown Unknown
gcclassic 000000000075F576 history_netcdf_mo 858 history_netcdf_mod.F90
gcclassic 0000000000743339 history_mod_mp_hi 3064 history_mod.F90
gcclassic 0000000000408621 MAIN__ 2006 main.F90
gcclassic 0000000000407262 Unknown Unknown Unknown
libc-2.17.so 00002B68FAF02555 __libc_start_main Unknown Unknown
gcclassic 0000000000407169 Unknown Unknown Unknown

After reading your answer to 'SatDiagn problems in gcclassic CO2 run (v14.1) #1805', I made a revision.

But after modifying the two files according to your instructions, it still can't run. Below is the new error message:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
gcclassic          0000000000D0C47A  Unknown               Unknown  Unknown
libpthread-2.17.s  00002AAC13EAD630  Unknown               Unknown  Unknown
gcclassic          0000000000C0C441  hco_driver_mod_mp         359  hco_driver_mod.F90
gcclassic          00000000004EADCC  hco_interface_gc_        1236  hco_interface_gc_mod.F90
gcclassic          000000000045CF5A  emissions_mod_mp_         394  emissions_mod.F90
gcclassic          00000000004348C6  cleanup_                  123  cleanup.F90
gcclassic          00000000009CC009  error_mod_mp_erro         437  error_mod.F90
gcclassic          000000000040AA60  MAIN__                    668  main.F90
gcclassic          0000000000407262  Unknown               Unknown  Unknown
libc-2.17.so       00002AAC140DC555  __libc_start_main     Unknown  Unknown
gcclassic          0000000000407169  Unknown               Unknown  Unknown
real 17.56
user 17.23
sys 0.32
srun: error: comput7: task 0: Exited with exit code 174
srun: Terminating job step 7859.0

I am not sure whether other people have met similar problem.Looking forward to your reply. Many thanks!

@yantosca
Copy link
Contributor

Thanks for raising this issue @Kexin828. The run died at line 828 of netcdf_history_mod.F90:

       !---------------------------------------------------------------------
       ! For instantaneous diagnostic quantities:
       ! (1) Copy the Item's data array to the 4-byte or 8-byte local array
       ! (2) Zero the Item's data array
       ! (3) Zero the Item's update counter
       !
       ! For time-averaged diagnostic quantities:
       ! (1) Divide the Item's data array by the number diagnostic updates
       ! (2) Copy the Item's data array to the 4-byte or 8-byte local array
       ! (3) Zero the Item's data array
       ! (4) Zero the Item's update counter
       !---------------------------------------------------------------------
       SELECT CASE( Item%SpaceDim )

          !------------------------------------------------------------------
          ! 3-D data
          !------------------------------------------------------------------
          CASE( 3 )

             ! Get dimensions of data
             Dim1 = SIZE( Item%Data_3d, 1 )
             Dim2 = SIZE( Item%Data_3d, 2 )
             Dim3 = SIZE( Item%Data_3d, 3 )

             ! Get average for satellite diagnostic:
             IF ( Container%name == 'SatDiagn' ) THEN
                Item%Data_3d = Item%Data_3d / State_Diag%SatDiagnCount
                Item%nUpdates = 1.0
             ENDIF

It may be that State_Diag%SatDiagnCount is zero, and that this caused a div-by-zero error. Maybe there is a grid box where there are no matching observations.

Perhaps what we should do here is to fill Item%Data_3d with a missing value so as to avoid the div-by-zero error.

@yantosca yantosca self-assigned this Oct 30, 2023
@yantosca yantosca added category: Bug Something isn't working topic: Diagnostics Related to output diagnostic data labels Oct 30, 2023
@yantosca
Copy link
Contributor

@Kexin828: try this:

             ! Get average for satellite diagnostic:
             IF ( Container%name == 'SatDiagn' ) THEN
                IF ( State_Diag%SatDiagnCount > 0 ) THEN
                   Item%Data_3d = Item%Data_3d / State_Diag%SatDiagnCount
                ELSE
                   Item%Data_3d = UNDEFINED
                ENDIF   
                Item%nUpdates = 1.0
             ENDIF

where UNDEFINED is defined in History/history_util_mod.F90.

@Kexin828
Copy link
Author

Kexin828 commented Nov 2, 2023

Thank you very much! I have solved this problem.

Copy link
Contributor

yantosca commented Nov 2, 2023

Thanks @Kexin828. I will leave this issue open to remind us to make this fix in an upcoming version.

@JFBrewer
Copy link

JFBrewer commented Jul 2, 2024

Just a note - this issue has cropped up again (in GC 14.1.0). I'm trying to use SatDiagn within the carbon cycle simulation. However, the proposed fix seems not to quite work for me, because I get the following error:

                 IF ( State_Diag%SatDiagnCount > 0 ) THEN
                     1
Error: IF clause at (1) requires a scalar LOGICAL expression

I think this implies that SatDiagnCount is somehow not a single number value? I'm unclear why this would be happening, but it might point to something wrong with my original configuration of the Satellite Diagnostic or it might be an issue with using SatDiagn in the carbon mod sim.

@yantosca
Copy link
Contributor

yantosca commented Jul 2, 2024

Thanks @JFBrewer. I'll take a look into this. We also haven't yet implemented the workaround described above so I'll do that as well.

@yantosca
Copy link
Contributor

yantosca commented Jul 2, 2024

@JFBrewer: I think I've found the issue. The State_Diag%SatDiagnCount is a 3-D array but not a scalar. So we have to use an array operation. I am testing this fix:

          !------------------------------------------------------------------
          ! 3-D data
          !------------------------------------------------------------------
          CASE( 3 )

             ! Get dimensions of data
             Dim1 = SIZE( Item%Data_3d, 1 )
             Dim2 = SIZE( Item%Data_3d, 2 )
             Dim3 = SIZE( Item%Data_3d, 3 )

             ! Get average for satellite diagnostic:
             IF ( INDEX( Container%name, 'SatDiagn' ) > 0 ) THEN
                WHERE( State_Diag%SatDiagnCount > 0 )
                   Item%Data_3d = Item%Data_3d / State_Diag%SatDiagnCount
                ELSEWHERE
                   Item%Data_3d = UNDEFINED
                ENDWHERE
                Item%nUpdates = 1.0
             ENDIF

@JFBrewer
Copy link

JFBrewer commented Jul 2, 2024

In my code, this did change the nature of the error, but I'm still getting a segfault for an invalid memory reference on the line WHERE( State_Diag%SatDiagnCount > 0 )

@JFBrewer
Copy link

JFBrewer commented Jul 3, 2024

I'm wondering if there's more to this error than meets the eye? In particular, when I ask SatDiagn to return just the following diagnostics:

  SatDiagn.fields:            'SatDiagnConc_C2H6              ',
'SatDiagnConc_CH4                ',
'SatDiagnConc_CO                   ',

it gets all the way to the previously specified error and then dies, claiming that SatDiagnCount is 0 for some reason. However, the moment I add additional diagnostic outputs that don't depend on species concentrations (e.g., SatDiagnRH or SatDiagnTAir), the simulation errors out in the very first timestep, with the following errors in diagnostics_mod.F90:

Carbon_Gases: Using global OH oxidant field option: GLOBAL_OH
===============================================================================
GEOS-Chem ERROR: OH is not a defined species in this simulation!!!
 -> at Do_Archive_SatDiagn (in module GeosCore/diagnostics_mod.F90)
==============================================================================

===============================================================================
GEOS-Chem ERROR: Error converting species units for archiving diagnostics #2
 -> at Set_Diagnostics_EndofTimestep (in GeosCore/diagnostics_mod.F90)
===============================================================================

===============================================================================
GEOS-CHEM ERROR: Error encountered in "Set_Diagnostics_EndOfTimestep"!
STOP at  -> at GEOS-Chem (in GeosCore/main.F90)
===============================================================================

Clearly this is a deeper problem, given that it's come up before, but it makes me wonder if, in my specific case, we could have two separate but overlapping errors:

  1. The default SatDiagnConc_* may not be actually interacting well with the SatDiagn, leading to code that tells the SatDiagn file to get written without any diagnostics to write
  2. The SatDiagn may also simply not be compatible as currently written with the carbon sim, since it seems to want id_OH to exist in order to go forward with SatDiagn archiving (see GeosCore/diagnostics_mod.F90:1334).

Anyway, I'm still not sure what is going on but I figured I'd drop the results of my sleuthing so far in here just in case that's helpful.

@yantosca
Copy link
Contributor

This should now be fixed by PR #2369, which is currently in testing.

@yantosca
Copy link
Contributor

We can now close this issue, as PR #2369 has been merged into the GEOS-Chem "no-diff-to-benchmark" development stream. This fix is on track to ship with 14.4.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working never stale Never label this issue as stale topic: Diagnostics Related to output diagnostic data
Projects
None yet
3 participants