Skip to content

Instructions for GEOS Chem Users

Liam Bindle edited this page Nov 18, 2021 · 25 revisions

This page is a work in progress.

Set up instructions

For existing GEOS-Chem input data repository

This section covers how you can start using the bashdatacatalog in an existing GEOS-Chem input data repository (ExtData/) on your local machine.

  1. Navigate to your local GEOS-Chem input data repository (ExtData/).
  2. Create a directory called CatalogFiles/ (or whatever you want to name it). This is where you will store the catalog files that specify your input data requirements (you maintain these files).
  3. Download the catalog files for the versions of GEOS-Chem that you want to use. Put them in your CatalogFiles/ directory. Make any edits you want to the catalog files (e.g., enabling the metfield collections that you need).
  4. Run bashdatacatalog CatalogFiles/*.csv fetch (run this at the root-level of your data repository).

Important limitations

There are only two mechanisms for selecting and filtering data files with the bashdatacatalog.

The first mechanism is the enable/disable switch in column 3 of a catalog file. This mechanism operates at the collection-level, and it is the only way to "activate" or "deactivate" data collections according to types of simulations you run. Simulation type-specific and grid-specific collections are not handled automatically! Instead, you need to "activate" the appropriate collections in your catalog file. By default, the active collections in the default catalog files are for a "standard" GEOS-Chem simulation (MERRA-2, full chemistry). For example, if you plan to run nested NA simulations with MERRA-2 metfields, you will need to put a 1 in column 3 of your meteorological inputs catalog for the GEOS_0.5x0.625_NA/MERRA2 collection.

The second mechanism is the optional date range in catalog queries. This mechanism operates at the file-level, and it is the only way to filter-out temporal files that aren't needed for your simulation period.

Caveats

Queries on GEOS-Chem input data catalogs dont give the exact minimum input file requirements like a dry-run would. The data catalogs are meant to organize input data at a high-level, so that you (a human) can select the data collections you need. In practice, this queries on catalogs will return more input files than what is strictly necessary, but in reality this overhead is relatively small. The benefit of this intentional limitation is that the catalog system is much simpler to use and maintain.

Here are the specifics of the "extra" data that query results will include:

  • For climatological data collections, the first and last year of data are considered "always required" (i.e., results of queries with date ranges will always include the first and last year of climatology data).
  • Emissions collections do not distinguish grid-specific or meteorology-specific files. HEMCO/OFFLINE_BIOVOC/v2019-10 is a single collection which include 0.25°x0.3125° and 0.5°x0.625° files.
Clone this wiki locally