Skip to content

Commit

Permalink
Improved documentation on native dataset support (#2635)
Browse files Browse the repository at this point in the history
  • Loading branch information
schlunma authored Apr 22, 2022
1 parent 33bc7a4 commit d8687d3
Show file tree
Hide file tree
Showing 3 changed files with 100 additions and 74 deletions.
4 changes: 2 additions & 2 deletions doc/sphinx/source/develop/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ data set for the use in ESMValTool.
*fixes*. As compared to the workflow described below, this has the advantage that
the user does not need to store a duplicate (CMORized) copy of the data. Instead, the
CMORization is performed 'on the fly' when running a recipe. **ERA5** is the first dataset
for which this 'CMORization on the fly' is supported. For more information, see:
:ref:`cmorization_as_fix`.
for which this 'CMORization on the fly' is supported. For more information, see
:ref:`inputdata_native_datasets`.


1. Check if your variable is CMOR standard
Expand Down
168 changes: 97 additions & 71 deletions doc/sphinx/source/input.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,18 @@ ESMValTool also provides support to download some observational dataset from sou

The chapter in the ESMValCore documentation on
:ref:`finding data <esmvalcore:findingdata>` explains how to
configure the ESMValTool so it can find locally available data and/or
configure ESMValTool so it can find locally available data and/or
download it from ESGF if it isn't available locally yet.


.. _inputdata_models:

Models
======

If you do not have access to a compute cluster with the data already mounted,
the ESMValTool can automatically download any required data that is available on ESGF.
ESMValTool can automatically download any required data that is available on
ESGF.
This is the recommended approach for first-time users to obtain some data for
running ESMValTool.
For example, run
Expand All @@ -73,11 +77,42 @@ to maintain your own collection of ESGF data.
Observations
============

Observational and reanalysis products in the standard CF/CMOR format used in CMIP and required by the ESMValTool are available via the obs4MIPs and ana4mips projects at the ESGF (e.g., https://esgf-data.dkrz.de/projects/esgf-dkrz/). Their use is strongly recommended, when possible.
Observational and reanalysis products in the standard CF/CMOR format used in
CMIP and required by ESMValTool are available via the obs4MIPs and ana4mips
projects at the ESGF (e.g., https://esgf-data.dkrz.de/projects/esgf-dkrz/).
Their use is strongly recommended, when possible.

Other datasets not available in these archives can be obtained by the user from
the respective sources and reformatted to the CF/CMOR standard.
ESMValTool currently supports two ways to perform this reformatting (aka
'CMORization'):

#. Using a CMORizer script: The first is to use a CMORizer script to generate a
local pool of reformatted data that can readily be used by ESMValTool. This
method is described in detail below.

#. Using fixes for on-the-fly CMORization: The second way is to implement
specific :ref:`'fixes' <esmvalcore:fixing_data>` for your dataset. In that
case, the reformatting is performed 'on the fly' during the execution of an
ESMValTool recipe (note that one of the first preprocessor tasks is 'CMOR
checks and fixes'). Details on this second method are given at the
:ref:`end of this chapter <inputdata_native_datasets>`.

Using a CMORizer script
-----------------------

ESMValTool comes with a set of CMORizers readily available.
The CMORizers are dataset-specific scripts that can be run once to generate a
local pool of CMOR-compliant data.
The necessary information to download and process the data is provided in the
header of each CMORizing script.
These scripts also serve as template to create new CMORizers for datasets not
yet included.
Note that datasets CMORized for ESMValTool v1 may not be working with v2, due
to the much stronger constraints on metadata set by the iris library.

Other datasets not available in these archives can be obtained by the user from the respective sources
and reformatted to the CF/CMOR standard.
The list of datasets supported by ESMValTool can be obtained with:
The list of datasets supported by ESMValTool through a CMORizer script can be
obtained with:

.. code-block:: bash
Expand All @@ -101,7 +136,7 @@ An entry to the ``~/.netrc`` should look like:
machine [server_name] login [user_name] password [password]
Make sure that the permissions of the ``~/.netrc`` file are set so only you and administrators
can read it, i.e.
can read it, i.e.

.. code-block:: bash
Expand All @@ -116,48 +151,27 @@ For other datasets, downloading instructions can be obtained with:
esmvaltool data info [DATASET]
ESMValTool currently support two ways to perform this reformatting (aka 'CMORization').
The first is to use a CMORizer to generate a local pool of reformatted data that can
readily be used by the ESMValTool.
The second way is to implement specific 'fixes' for your dataset.
In that case, the reformatting is performed 'on the fly' during the execution of an ESMValTool
recipe (note that one of the first preprocessor tasks is 'CMOR checks and fixes').
Below, both methods are explained in more detail.

Using a CMORizer script
-----------------------

ESMValTool comes with a set of CMORizers readily available.
The CMORizers are dataset-specific scripts that can be run once to generate
a local pool of CMOR-compliant data. The necessary information to download
and process the data is provided in the header of each CMORizing script.
These scripts also serve as template to create new CMORizers for datasets not
yet included.
Note that datasets CMORized for ESMValTool v1 may not be working with v2, due
to the much stronger constraints on metadata set by the iris library.

To CMORize one or more datasets, run:

.. code-block:: bash
esmvaltool data format --config_file [CONFIG_FILE] [DATASET_LIST]
The path to the raw data to be CMORized must be specified in the
:ref:`user configuration file<config-user>` as RAWOBS.
The path to the raw data to be CMORized must be specified in the :ref:`user
configuration file<config-user>` as RAWOBS.
Within this path, the data are expected to be organized in subdirectories
corresponding to the data tier: Tier2 for freely-available datasets (other
than obs4MIPs and ana4mips) and Tier3 for restricted datasets (i.e., dataset
which requires a registration to be retrieved or provided upon request to
the respective contact or PI).
The CMORization follows the
`CMIP5 CMOR tables <https://github.com/PCMDI/cmip5-cmor-tables>`_ or
`CMIP6 CMOR tables <https://github.com/PCMDI/cmip6-cmor-tables>`_ for the
OBS and OBS6 projects respectively.
corresponding to the data tier: Tier2 for freely-available datasets (other than
obs4MIPs and ana4mips) and Tier3 for restricted datasets (i.e., dataset which
requires a registration to be retrieved or provided upon request to the
respective contact or PI).
The CMORization follows the `CMIP5 CMOR tables
<https://github.com/PCMDI/cmip5-cmor-tables>`_ or `CMIP6 CMOR tables
<https://github.com/PCMDI/cmip6-cmor-tables>`_ for the OBS and OBS6 projects
respectively.
The resulting output is saved in the output_dir, again following the Tier
structure.
The output file names follow the definition given in
:ref:`config-developer file <esmvalcore:config-developer>` for the ``OBS``
project:
The output file names follow the definition given in :ref:`config-developer
file <esmvalcore:config-developer>` for the ``OBS`` project:

.. code-block::
Expand All @@ -170,39 +184,11 @@ may be ``sat`` (satellite data), ``reanaly`` (reanalysis data),

At the moment, ``esmvaltool data format`` supports Python and NCL scripts.

.. _cmorization_as_fix:

CMORization as a fix
--------------------
ESMValCore also provides support for some datasets in their native format.
In this case, the steps needed to reformat the data are executed as datasets
fixes during the execution of an ESMValTool recipe, as one of the first
preprocessor steps, see :ref:`fixing data <esmvalcore:fixing_data>`.
Compared to the workflow described above, this has the advantage that the user
does not need to store a duplicate (CMORized) copy of the data.
Instead, the CMORization is performed 'on the fly' when running a recipe.
The native6 project supports files named according to the format defined in
the :ref:`config-developer file <esmvalcore:config-developer>`.
Some of ERA5, ERA5-Land and MSWEP data are currently supported, see
:ref:`supported datasets <supported_datasets>`.

To use this functionality, users need to provide a path for the ``native6``
project data in the :ref:`user configuration file<config-user>`.
Then, in the recipe, they can refer to the native6 project.
For example:

.. code-block:: yaml
datasets:
- {dataset: ERA5, project: native6, type: reanaly, version: '1', tier: 3, start_year: 1990, end_year: 1990}
More examples can be found in the diagnostics ``ERA5_native6`` in the recipe
`examples/recipe_check_obs.yml <https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/recipes/examples/recipe_check_obs.yml>`_.

.. _supported_datasets:

Supported datasets
------------------
Supported datasets for which a CMORizer script is available
-----------------------------------------------------------

A list of the datasets for which a CMORizers is available is provided in the following table.

.. tabularcolumns:: |p{3cm}|p{6cm}|p{3cm}|p{3cm}|
Expand Down Expand Up @@ -371,3 +357,43 @@ A list of the datasets for which a CMORizers is available is provided in the fol
.. [#note1] CMORization is built into ESMValTool through the native6 project, so there is no separate CMORizer script.
.. [#note2] Derived on the fly from down & net radiation.
.. _inputdata_native_datasets:

Datasets in native format
=========================

ESMValCore also provides support for some datasets in their native format.
In this case, the steps needed to reformat the data are executed as dataset
fixes during the execution of an ESMValTool recipe, as one of the first
preprocessor steps, see :ref:`fixing data <esmvalcore:fixing_data>`.
Compared to the workflow described above, this has the advantage that the user
does not need to store a duplicate (CMORized) copy of the data.
Instead, the CMORization is performed 'on the fly' when running a recipe.
Native datasets can be hosted either under a dedicated project (usually done
for native model output) or under project ``native6`` (usually done for native
reanalysis/observational products).
These projects are configured in the :ref:`config-developer file
<esmvalcore:configure_native_models>`.

A list of all currently supported native datasets is :ref:`provided here
<esmvalcore:read_native_datasets>`.
A detailed description of how to include new native datasets is given
:ref:`here <esmvalcore:add_new_fix_native_datasets>`.

To use this functionality, users need to provide a path in the
:ref:`esmvalcore:user configuration file` for the ``native6`` project data
and/or the dedicated project used for the native dataset, e.g., ``ICON``.
Then, in the recipe, they can refer to those projects.
For example:

.. code-block:: yaml
datasets:
- {project: native6, dataset: ERA5, type: reanaly, version: '1', tier: 3, start_year: 1990, end_year: 1990}
- {project: ICON, dataset: ICON, version: 42-0, component: atm, exp: amip, grid: R2B5, ensemble: r1i1, var_type: 2d}
For project ``native6``, more examples can be found in the diagnostics
``ERA5_native6`` in the recipe `examples/recipe_check_obs.yml
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/recipes/examples/recipe_check_obs.yml>`_.
2 changes: 1 addition & 1 deletion doc/sphinx/source/recipes/recipe_cmorizers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Overview
These are CMORizer recipes calling CMORizer diagnostic scripts.

ESMValCore supports ERA5 hourly and monthly datasets in their native
format, see :ref:`CMORization as a fix <esmvaltool:cmorization_as_fix>`
format, see :ref:`inputdata_native_datasets`.
and `ERA5 data documentation <https://confluence.ecmwf.int/display/CKB/ERA5%3A+data+documentation>`_.
It may be useful in some cases to create ERA5 daily CMORized data. This can be
achieved by using a CMORizer *recipe*,
Expand Down

0 comments on commit d8687d3

Please sign in to comment.