Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with CMIP6 file #414

Open
tessac2 opened this issue May 20, 2024 · 26 comments
Open

Error with CMIP6 file #414

tessac2 opened this issue May 20, 2024 · 26 comments
Assignees
Labels
category: Debug Help Request for help debugging GCHP topic: Input Data Related to input data

Comments

@tessac2
Copy link

tessac2 commented May 20, 2024

Name and Institution (Required)

Name: Tessa Clarizio
Institution: UIUC

Confirm you have reviewed the following documentation

Description of your issue or question

Hi GCSP! I am running GCHP v14.2.0 using Singularity with GEOSFP meteorology for 2019. I am running into an issue where I get an error "./HcoDir/CMIP6/v2020-03/2x2.5/CMIP6_GHG_surface_VMR_2019.2x25.nc [No such file or directory]". I am a bit confused why I am getting this error, because I was able to do a run successfully in the same time period in 2019 when I was using MERRA2 meteorology. I also got an error file I had not seen before, "PET000.ESMF_Logfile". So I am not sure if the reason I am getting this error is because of the different met fields or for other changes I made, and if there is a way to circumvent this error? I made the following modifications (which I did not do when I was testing out a run with MERRA2):

HEMCO_Config.rc

  • Turned on QFED2
  • Turned GFED off
  • Turned GFED4 false

HEMCO_Diagn

ExtData.rc

HISTORY.rc

  • Turned on AerosolMass and Aerosols and set frequency and duration to 240000 to get daily output

image
image

gchp.20190701_0000z.log
HISTORY.txt
PET000.ESMF_LogFile.txt
slurm-11190673.txt
HEMCO_Diagn.txt
ExtData.txt
HEMCO_Config.txt

@yantosca
Copy link
Contributor

Hi @tessac2, thanks for writing. I noticed in the slurm log there is this output:

nf90_open: returned error code (2) opening ./HcoDir/CMIP6/v2020-03/2x2.5/CMIP6_GHG_surface_VMR_2019.2x25.nc [No such file or directory]
pe=00014 FAIL at line=00280    NetCDF4_FileFormatter.F90                <status=2>
pe=00014 FAIL at line=00097    DataCollection.F90                       <status=2>
pe=00014 FAIL at line=02799    ExtDataGridCompMod.F90                   <status=2>
pe=00014 FAIL at line=02526    ExtDataGridCompMod.F90                   <status=2>
pe=00014 FAIL at line=01346    ExtDataGridCompMod.F90                   <status=2>
pe=00014 FAIL at line=01807    MAPL_Generic.F90                         <status=2>
pe=00014 FAIL at line=01337    MAPL_CapGridComp.F90                     <status=2>
pe=00014 FAIL at line=01300    MAPL_CapGridComp.F90                     <status=2>
pe=00014 FAIL at line=01260    MAPL_CapGridComp.F90                     <status=2>
pe=00014 FAIL at line=00837    MAPL_CapGridComp.F90                     <status=2>
pe=00014 FAIL at line=00977    MAPL_CapGridComp.F90                     <status=2>
pe=00014 FAIL at line=00301    MAPL_Cap.F90                             <status=2>
pe=00014 FAIL at line=00258    MAPL_Cap.F90                             <status=2>
pe=00014 FAIL at line=00192    MAPL_Cap.F90                             <status=2>
pe=00014 FAIL at line=00169    MAPL_Cap.F90                             <status=2>
pe=00014 FAIL at line=00031    GCHPctm.F90                              <status=2>

It looks like the file name specified in the out-of-the-box ExtData.rc is incorrect. A quick look in my HEMCO/CMIP6/v2020-03 shows that there isn't a resolution string in the file name:

$ cd /path/to/ExtData/HEMCO/CMIP6/v2020-03
$ ls *20*
CMIP6_GHG_surface_VMR_1820.nc  CMIP6_GHG_surface_VMR_2004.nc  CMIP6_GHG_surface_VMR_2010.nc
CMIP6_GHG_surface_VMR_1920.nc  CMIP6_GHG_surface_VMR_2005.nc  CMIP6_GHG_surface_VMR_2011.nc
CMIP6_GHG_surface_VMR_2000.nc  CMIP6_GHG_surface_VMR_2006.nc  CMIP6_GHG_surface_VMR_2012.nc
CMIP6_GHG_surface_VMR_2001.nc  CMIP6_GHG_surface_VMR_2007.nc  CMIP6_GHG_surface_VMR_2013.nc
CMIP6_GHG_surface_VMR_2002.nc  CMIP6_GHG_surface_VMR_2008.nc  CMIP6_GHG_surface_VMR_2014.nc
CMIP6_GHG_surface_VMR_2003.nc  CMIP6_GHG_surface_VMR_2009.nc

But in the default ExtData.rc.fullchem there are these entries with the .2x25.nc:

#
#==============================================================================
# --- Surface VMR (SfcVMR) ---
#==============================================================================
SfcVMR_CH3Cl  ppbv N Y F%y4-%m2-01T00:00:00 none none CH3Cl  ./HcoDir/CMIP6/v2020-03/2x2.5/CMIP6_GHG_surface_VMR_%y4.2x25.nc
SfcVMR_CH2Cl2 ppbv N Y F%y4-%m2-01T00:00:00 none none CH2Cl2 ./HcoDir/CMIP6/v2020-03/2x2.5/CMIP6_GHG_surface_VMR_%y4.2x25.nc
SfcVMR_CHCl3  ppbv N Y F%y4-%m2-01T00:00:00 none none CHCl3  ./HcoDir/CMIP6/v2020-03/2x2.5/CMIP6_GHG_surface_VMR_%y4.2x25.nc
SfcVMR_CH3Br  ppbv N Y F%y4-%m2-01T00:00:00 none none CH3Br  ./HcoDir/CMIP6/v2020-03/2x2.5/CMIP6_GHG_surface_VMR_%y4.2x25.nc
#

I believe that is the source of the error. Try removing the .2x25 from the above entries in ExtData.rc and see if that works. If so then that's a bug that we'll have to patch.

Tagging @msulprizio @lizziel

@yantosca yantosca self-assigned this May 20, 2024
@yantosca yantosca added category: Debug Help Request for help debugging GCHP topic: Runtime Related to runtime issues (e.g. simulation stops with error) category: Bug Something isn't working labels May 20, 2024
@tessac2
Copy link
Author

tessac2 commented May 20, 2024

Thanks Bob! My file path is slightly different than the one you showed /HEMCO/CMIP6/v2020-03/2x2.5 instead of /HEMCO/CMIP6/v2020-03. In the 2x2.5 folder I have files with the .2x25.nc extension (from http://geoschemdata.wustl.edu/ExtData/HEMCO/CMIP6/v2020-03/2x2.5/). Should the pathway instead be to /HEMCO/CMIP6/v2020-03?

I was also wondering if the error is because the simulation date is in 2019 and the most recent CMIP files is 2014? But I am unsure why I get this error when I previously was able to run a simulation successfully in 2019 when I had MERRA2 as the meteorology.

@yantosca
Copy link
Contributor

Hi @tessac2. D'oh! I didn't see the 2x2.5 folder there, it was at the top of my screen and I was only looking at the bottom.

I also thought about the time cycling... that it the data only goes up to 2014. But I know that the ExtData time cycling isn't as sophisticated as HEMCO. @lizziel can correct me if I'm wrong, but I think it keeps searching backwards in time until it finds a file with a valid timestamp. So it the year being 2019 might not be the issue.

One thing you can try is to generate more debug output with by changingCAP.EXTDATA entries in logging.yml from WARNING to DEBUG. You'll get a list of the containers that are being read in. It'll generate a LOT more output, one line per core, so you may want to try running e.g. a c24 simulation with less cores so you don't get a huge allPEs.log file.

@tessac2
Copy link
Author

tessac2 commented May 21, 2024

Thanks @yantosca! When I ran the c24 simulation with the debug I got many PET*.ESMF_LogFiles (numbered 00 to 95). They all have a similar error message 20240520 164632.064 ERROR PET39 ESMF_Time.F90:815 ESMF_TimeGet() Object Set or SetDefault method not called - Object not Initialized . The slurm out file has a similar error as before, ./HcoDir/CMIP6/v2020-03/2x2.5/CMIP6_GHG_surface_VMR_2019.2x25.nc [No such file or directory]

image
image
slurm-11230222.txt
allPEs.zip
PET95.ESMF_LogFile.txt

@lizziel
Copy link
Contributor

lizziel commented May 21, 2024

Hi @tessac2, the issue is you have a typo at the top of ExtData.rc:

fExt_AllowExtrap: .true.

Because there is an f it is not interpretting that line correctly and thus not extrapolating to closest year.

Regarding the PET files, these are ESMF error log files. We usually see ESMF time errors related to diagnostics in MAPL diagnostics and actually turned off ESMF error message generation by default in 14.3 since they are harmless. Try undoing the HISTORY.rc changes you made and see if the PET log files then go away. Report back on how that goes. This will give us more information to give to NASA GMAO (I have an open issue on their GitHub about ESMF logs files being created).

As an fyi, you do not need to print ExtData debug messages from all cores. You can limit it to root core only like this:

   CAP.EXTDATA:
       handlers: [mpi_shared]
       level: WARNING
       root_level: DEBUG
       propagate: false

I am updating the 14.4.0 docs to be more clear about this.

@yantosca
Copy link
Contributor

Thanks @lizziel for catching that!

@lizziel
Copy link
Contributor

lizziel commented May 21, 2024

@tessac2, let us know if the default ExtData.rc file had that typo. If yes, I'll create a general GitHub issue to alert users. 14.2.0 is pretty old at this point since we are about to release 14.4 so we will not go back and issue a patch.

@tessac2
Copy link
Author

tessac2 commented May 21, 2024

@lizziel Thank you so much for pointing that out! There was no typo in the default-- I (or my cat who loves to walk across the keyboard) must have accidentally hit a button when I was opening the ExtData.rc file and I did not notice. Thank you so much for catching it!

This solved the CMIP issue I am having, but I am still having a bit of trouble getting the code to run. I am now getting an error, where it says it cannot find variables like MEK, CH4, CO2, etc in the QFED files. My guess is that I labeled them wrong in the ExtData.rc when I was filling in the QFED section. I had tried to copy and paste from the HEMCO_Config.rc and rearrange accordingly to match the new formatting, and delete the lines that were just '-', but I still get errors. I am also a bit confused because the HEMCO_Config.rc has QFED emissions divided into PBL and FT, but I was not sure how to do that in ExtData (or if it is possible/necessary). I read through issue #412 and the links provided there but was still having a bit of trouble understanding. I was wondering if you could advise or if there is any example for QFED emissions? Thanks!

image
slurm-11272551.txt
ExtData.txt

@yantosca
Copy link
Contributor

@tessac2, I have 2 cats so I understand exactly what you mean :-)

Also, in ExtData.rc I believe you don't have to read the same variable twice.

In the default HEMCO_Config.rc file e.g. ACET is written like this:

(((QFED2
0 QFED_ACET_PBL  $ROOT/QFED/v2018-07/$YYYY/$MM/qfed2.emis_acet.006.$YYYY$MM$DD.nc4 biomass 2000-2022/1-12/1-31/0/+12hour EFY xyL=1:PBL     kg/m2/s ACET 75/311        5 2
0 QFED_ACET_FT   $ROOT/QFED/v2018-07/$YYYY/$MM/qfed2.emis_acet.006.$YYYY$MM$DD.nc4 biomass 2000-2022/1-12/1-31/0/+12hour EFY xyL=PBL:5500m kg/m2/s ACET 75/312        5 2

But I think it could be written like this, so that the QFED_ACET_FT can be "piggybacked" onto the previous entry

0 QFED_ACET_PBL  $ROOT/QFED/v2018-07/$YYYY/$MM/qfed2.emis_acet.006.$YYYY$MM$DD.nc4 biomass 2000-2022/1-12/1-31/0/+12hour EFY xyL=1:PBL     kg/m2/s ACET 75/311        5 2
0 QFED_ACET_FT   -                                                                 -       -                             -   xyL=PBL:5500m kg/m2/s ACET 75/312        5 2

For "piggybacked" entries in GCHP, you only have to specify the entry where the file is read from disk (i.e. the one with the file name). HEMCO will figure out the rest.

So you would only need to add the QFED_*_PBL entries in ExtData.rc, I think.

@lizziel
Copy link
Contributor

lizziel commented May 21, 2024

@yantosca is right on that! Putting the same filename/variable name pair in ExtData.rc more than once just slows down the model. If you use the dashes method in HEMCO_Config.rc then it reduces the GCHP I/O overhead.

As for the error of not finding ACET, I think it is because the variable name in the file is actually biomass. You need to put the file variable name rather than the GEOS-Chem species in ExtData.rc. MAPL will read that variable from file and assign it to the HEMCO variable name, e.g. QFED_ACET_PBL. Then HEMCO will assign it a GEOS-Chem species based on what is in HEMCO_Config.rc. Hopefully updating the ExtData.rc variable name name will fix the remaining issues.

@tessac2
Copy link
Author

tessac2 commented May 21, 2024

Thanks Bob!

For reference this was the error I was initially getting.
image

Then I deleted the FT entries in GCHP and changed the species name to biomass and got this error in the allPEs.log:
image
image

So I added the FT lines back in. It seems to be running now so I will update you on the results!

@lizziel
Copy link
Contributor

lizziel commented May 21, 2024

Look like we commented at exactly the same time!

@tessac2
Copy link
Author

tessac2 commented May 21, 2024

Thanks @lizziel! It seems to be running now will follow up :)

@tessac2
Copy link
Author

tessac2 commented Jun 6, 2024

Just providing an update, I was able to get it to run temporarily, but the run keeps failing a few weeks in. I have been running for 1 week at a time. Initially I tried running in October 2020, running for 1 week at a time. Then, for the run starting 2020-10-22 I got the error KPP failed to converge. In line with other GCHP issues that had similar errors, I changed the restart file date and I started the simulation from another month, 2020-11-01. However, this also failed a couple weeks into the simulation with the same error.

image
gchp.20201022_0000z.log
gchp.20201115_0000z.log

@lizziel
Copy link
Contributor

lizziel commented Jun 10, 2024

Hi @tessac2, are you able to reproduce this problem when running with the default emissions?

@lizziel lizziel added topic: Input Data Related to input data and removed category: Bug Something isn't working topic: Runtime Related to runtime issues (e.g. simulation stops with error) labels Jun 10, 2024
@tessac2
Copy link
Author

tessac2 commented Jun 17, 2024

Thanks @lizziel, I did not get the same error when I switched from QFED back to GFED. I wonder if I filled in the ExtData correctly when I had QFED emissions, and if this caused an error? (screenshot is shown in previous reply). Or if it has something else to do with QFED?

@lizziel
Copy link
Contributor

lizziel commented Jun 18, 2024

Hi @tessac2, I am not sure if anyone in the community uses QFED with GCHP. You could try doing a run with frequent diagnostics and then look at the results to see if anything is going off the rails. You can also add in QFED emissions as diagnostics to see what they look like. These diagnostics use the vertical height injection feature of HEMCO (xyL=1:PBL) so I wonder if something is off there.

@lizziel lizziel assigned lizziel and unassigned yantosca Jun 18, 2024
@tessac2
Copy link
Author

tessac2 commented Jul 9, 2024

Thanks @lizziel! I started another run using GFED and MERRA-2 for the meteorology, after discussion from IGC11 where GEOS-FP is not recommended after June 2020. However, now I am running into the same issue again. I was able to run the simulation for May 2021 but my June run fails on June 4, 2021.

image
gchp.20210601_0000z.log

@yantosca
Copy link
Contributor

yantosca commented Jul 9, 2024

Hi @tessac2, thanks for the feedback. Looking at your output there are some rxn rates that really blow up. Also there are a lot of negative concentrations (probably a side effect of these). For example:

   9902850.5014838222      ClOO --> Cl + O2   5.9307137702429200      Cl2O2 --> 2 ClO
   5.9307137702429200      Cl2O2 --> 2 ClO
  0.20863623086246982      IONO --> I + NO2
   4.4691598926869987E-033 CH2OO + 2 H2O --> 0.06 PH2O2 + 0.4 HMHP + 0.06 H2O2 + 0.54 HCOOH + 0.06 CH2O
  0.41594861909815112      NO3 --> NO2 + O

Most of the rxn rates constants (1/s) are pretty small, of order 1e-5 to 1e-15 or less), so these stood out.

I also wonder if this is due to the dummy species being included in the KPP convergence. See geoschem/geos-chem#2359 and KineticPreProcessor/KPP#66. I've pulled the fix in the PR 2359 into the GEOS-Chem 14.5.0 branch, but you could try to implement the quick fix that @obin1 suggested in this comment. It's worth a shot.

@tessac2
Copy link
Author

tessac2 commented Jul 10, 2024

Thanks @yantosca! I am using Singularity to run GCHP however, so I am not sure if I am able to implement the quick fix since I cannot access the CodeDir (or at least I am not sure how to do that, there is no link in my run directory). I am not sure what is causing the reaction rates to blow up, perhaps like you said it is the dummy species?

image

Copy link
Contributor

@tessac2: In the container, could you clone a new copy of GCHP? Then you can add the quick fix in and recompile it.

@lizziel
Copy link
Contributor

lizziel commented Jul 11, 2024

@tessac2 I wonder if you are hitting integrator errors because of this bug in 14.2. Several users found instability in fullchem runs for GCHP. A fix was put into 14.3 for it. Could you try updating versions? 14.4.1 is the latest so we recommend that one.

@tessac2
Copy link
Author

tessac2 commented Jul 11, 2024

@yantosca I think I am only able to pull images from the Docker Hub. (https://hub.docker.com/r/geoschem/gchp/tags?page=1&ordering=last_updated). I run singularity pull gchp.sif docker://geoschem/gchp:14.2.0 to download the image then singularity exec -B $HOME:$HOME -B /projects/horowitz_group/GEOSChem_input_data/ExtData:/projects/horowitz_group/GEOSChem_input_data/ExtData -B /projects/horowitz_group/tessa/GCHP/rundirs:/workdir gchp.sif /bin/bash -c ". ~/.bashrc && /opt/geos-chem/bin/createRunDir.sh" to set up the run directory.

Thanks @lizziel! On versions past 14.2 there was an update where the openmpi version is no longer compatible with my campus cluster system, so I have been unable to test more recent versions. I know @yidant was looking into this.

@yidant I see you pushed v14.4.1 image yesterday, was the issue with the openmpi fixed in this version or will it be coming with the next version?

@yidant
Copy link
Contributor

yidant commented Jul 11, 2024

Hi @tessac2 , the v14.4.1 image only contains executable built from GCHP 14.4.1. The update will come with v14.5.0. Feel free to compile a new one!

The openmpi package is already updated in these versions.

@tessac2
Copy link
Author

tessac2 commented Jul 12, 2024

Thank @yidant. My NX and NY do not seem to be updating automatically? I have not had this issue with previous versions. It is saying my NX =4 and NY = 24 but according to my setCommonRunSettings my NX=2 and NY =48.

If I set the resolution to 48, I run into another error saying that libnetcdf.so.19 is missing from the gchp file. I had this error for v14.4.0 as well. This is not the gchp.sif file, but rather the gchp file that automatically populates in rundirs when creating a run directory.

image
image

image

@lizziel
Copy link
Contributor

lizziel commented Jul 12, 2024

GCHP 14.4.1 has a bug where the error message you see for NX and NY is triggered if using certain combinations of grid resolution and cores. This will be fixed in 14.4.2. Commenting out the place where it errors out in setCommonRunSettings.sh fixes the issue. However, I think you would then run into the library issue you are seeing if you also hit it with 14.4.0. Using grid resolution that is a multiple of 24 would also fix the issue, but you are hitting the library problem with that too.

I think the best solution is to clone the code in your 14.1 environment and manually apply the HMS fix to see if that works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Debug Help Request for help debugging GCHP topic: Input Data Related to input data
Projects
None yet
Development

No branches or pull requests

4 participants