Skip to content

Commit

Permalink
Make DRS path templates configurable per rootpath (#1894)
Browse files Browse the repository at this point in the history
Co-authored-by: Manuel Schlund <manuel.schlund@dlr.de>
Co-authored-by: Manuel Schlund <32543114+schlunma@users.noreply.github.com>
  • Loading branch information
3 people authored May 29, 2024
1 parent 4053008 commit d8b4d4d
Show file tree
Hide file tree
Showing 15 changed files with 345 additions and 185 deletions.
8 changes: 4 additions & 4 deletions doc/quickstart/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ User configuration file


The ``config-user.yml`` configuration file contains all the global level
information needed by ESMValTool. It can be reused as many times the user needs
information needed by ESMValCore. It can be reused as many times the user needs
to before changing any of the options stored in it. This file is essentially
the gateway between the user and the machine-specific instructions to
``esmvaltool``. By default, esmvaltool looks for it in the home directory,
Expand Down Expand Up @@ -189,7 +189,7 @@ and memory usage.
A detailed explanation of the data finding-related sections of the
``config-user.yml`` (``rootpath`` and ``drs``) is presented in the
:ref:`data-retrieval` section. This section relates directly to the data
finding capabilities of ESMValTool and are very important to be understood by
finding capabilities of ESMValCore and are very important to be understood by
the user.

.. note::
Expand Down Expand Up @@ -945,7 +945,7 @@ addition of more details per project, dataset, mip table, and variable name.
More precisely, one can provide this information in an extra yaml file, named
`{project}-something.yml`, where `{project}` corresponds to the project as used
by ESMValTool in :ref:`Datasets` and "something" is arbitrary.
by ESMValCore in :ref:`Datasets` and "something" is arbitrary.
Format of the extra facets files
--------------------------------
Expand Down Expand Up @@ -998,7 +998,7 @@ variable of any CMIP5 dataset that does not have a ``product`` key yet:
Location of the extra facets files
----------------------------------
Extra facets files can be placed in several different places. When we use them
to support a particular use-case within the ESMValTool project, they will be
to support a particular use-case within the ESMValCore project, they will be
provided in the sub-folder `extra_facets` inside the package
:mod:`esmvalcore.config`. If they are used from the user side, they can be either
placed in `~/.esmvaltool/extra_facets` or in any other directory of the users
Expand Down
92 changes: 55 additions & 37 deletions doc/quickstart/find_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Input data
Overview
========
Data discovery and retrieval is the first step in any evaluation process;
ESMValTool uses a `semi-automated` data finding mechanism with inputs from both
ESMValCore uses a `semi-automated` data finding mechanism with inputs from both
the user configuration file and the recipe file: this means that the user will
have to provide the tool with a set of parameters related to the data needed
and once these parameters have been provided, the tool will automatically find
Expand All @@ -31,7 +31,7 @@ standard for naming files and structured paths; the `DRS
<https://www.ecmwf.int/sites/default/files/elibrary/2014/13713-data-reference-syntax-governing-standards-within-climate-research-data-archived-esgf.pdf>`_
ensures that files and paths to them are named according to a
standardized convention. Examples of this convention, also used by
ESMValTool for file discovery and data retrieval, include:
ESMValCore for file discovery and data retrieval, include:

* CMIP6 file: ``{variable_short_name}_{mip}_{dataset_name}_{experiment}_{ensemble}_{grid}_{start-date}-{end-date}.nc``
* CMIP5 file: ``{variable_short_name}_{mip}_{dataset_name}_{experiment}_{ensemble}_{start-date}-{end-date}.nc``
Expand All @@ -44,7 +44,7 @@ ESGF data nodes, these paths differ slightly, for example:
{variable_short_name}/{grid}``;
* CMIP6 path for ETHZ: ``ROOT-ETHZ/{experiment}/{mip}/{variable_short_name}/{dataset_name}/{ensemble}/{grid}``

From the ESMValTool user perspective the number of data input parameters is
From the ESMValCore user perspective the number of data input parameters is
optimized to allow for ease of use. We detail this procedure in the next
section.

Expand Down Expand Up @@ -163,7 +163,7 @@ dedicated projects instead of the project ``native6``.
CESM
^^^^

ESMValTool is able to read native `CESM <https://www.cesm.ucar.edu/>`__ model
ESMValCore is able to read native `CESM <https://www.cesm.ucar.edu/>`__ model
output.

.. warning::
Expand Down Expand Up @@ -252,7 +252,7 @@ Key Description Default value if not
EMAC
^^^^

ESMValTool is able to read native `EMAC
ESMValCore is able to read native `EMAC
<https://www.dlr.de/pa/en/desktopdefault.aspx/tabid-8859/15306_read-37415/>`_
model output.

Expand All @@ -274,7 +274,7 @@ Thus, example dataset entries could look like this:
- {project: EMAC, dataset: EMAC, exp: historical, mip: Amon, short_name: ta, raw_name: tm1_p39_cav, start_year: 2000, end_year: 2014}
Please note the duplication of the name ``EMAC`` in ``project`` and
``dataset``, which is necessary to comply with ESMValTool's data finding and
``dataset``, which is necessary to comply with ESMValCore's data finding and
CMORizing functionalities.
A variable-specific default for the facet ``channel`` is given in the extra
facets (see next paragraph) for many variables, but this can be overwritten in
Expand All @@ -285,7 +285,7 @@ facets<extra_facets>`.
By default, the file :download:`emac-mappings.yml
</../esmvalcore/config/extra_facets/emac-mappings.yml>` is used for that
purpose.
For some variables, extra facets are necessary; otherwise ESMValTool cannot
For some variables, extra facets are necessary; otherwise ESMValCore cannot
read them properly.
Supported keys for extra facets are:

Expand Down Expand Up @@ -326,7 +326,7 @@ Key Description Default value if not
ICON
^^^^

ESMValTool is able to read native `ICON
ESMValCore is able to read native `ICON
<https://code.mpimet.mpg.de/projects/iconpublic>`_ model output.

The default naming conventions for input directories and files for ICON are
Expand All @@ -349,7 +349,7 @@ Thus, example dataset entries could look like this:
end_year: 2014}
Please note the duplication of the name ``ICON`` in ``project`` and
``dataset``, which is necessary to comply with ESMValTool's data finding and
``dataset``, which is necessary to comply with ESMValCore's data finding and
CMORizing functionalities.
A variable-specific default for the facet ``var_type`` is given in the extra
facets (see below) for many variables, but this can be overwritten in the
Expand Down Expand Up @@ -460,7 +460,7 @@ facets<extra_facets>`.
By default, the file :download:`icon-mappings.yml
</../esmvalcore/config/extra_facets/icon-mappings.yml>` is used for that
purpose.
For some variables, extra facets are necessary; otherwise ESMValTool cannot
For some variables, extra facets are necessary; otherwise ESMValCore cannot
read them properly.
Supported keys for extra facets are:

Expand Down Expand Up @@ -569,15 +569,15 @@ files must also undergo some data selection.

Data retrieval
==============
Data retrieval in ESMValTool has two main aspects from the user's point of
Data retrieval in ESMValCore has two main aspects from the user's point of
view:

* data can be found by the tool, subject to availability on disk or `ESGF <https://esgf.llnl.gov/>`_;
* it is the user's responsibility to set the correct data retrieval parameters;

The first point is self-explanatory: if the user runs the tool on a machine
that has access to a data repository or multiple data repositories, then
ESMValTool will look for and find the available data requested by the user.
ESMValCore will look for and find the available data requested by the user.
If the files are not found locally, the tool can search the ESGF_ and download
the missing files, provided that they are available.

Expand All @@ -598,7 +598,7 @@ the :ref:`user configuration file`.

Setting the correct root paths
------------------------------
The first step towards providing ESMValTool the correct set of parameters for
The first step towards providing ESMValCore the correct set of parameters for
data retrieval is setting the root paths to the data. This is done in the user
configuration file ``config-user.yml``. The two sections where the user will
set the paths are ``rootpath`` and ``drs``. ``rootpath`` contains pointers to
Expand All @@ -608,24 +608,11 @@ first discuss the ``drs`` parameter: as we've seen in the previous section, the
DRS as a standard is used for both file naming conventions and for directory
structures.

Synda
-----

If the `synda install <https://prodiguer.github.io/synda/sdt/user_guide.html#synda-install>`_ command is used to download data,
it maintains the directory structure as on ESGF. To find data downloaded by
synda, use the ``SYNDA`` ``drs`` parameter.

.. code-block:: yaml
drs:
CMIP6: SYNDA
CMIP5: SYNDA
.. _config-user-drs:

Explaining ``config-user/drs: CMIP5:`` or ``config-user/drs: CMIP6:``
---------------------------------------------------------------------
Whereas ESMValTool will **always** use the CMOR standard for file naming (please
Whereas ESMValCore will by default use the CMOR standard for file naming (please
refer above), by setting the ``drs`` parameter the user tells the tool what
type of root paths they need the data from, e.g.:

Expand Down Expand Up @@ -655,10 +642,17 @@ is another way to retrieve data from a ``ROOT`` directory that has no DRS-like
structure; ``default`` indicates that the data lies in a directory that
contains all the files without any structure.

The names of the directories trees that can be used under `drs` are defined in
:ref:`config-developer`.

.. note::
When using ``CMIP6: default`` or ``CMIP5: default`` it is important to
remember that all the needed files must be in the same top-level directory
set by ``default`` (see below how to set ``default``).
When using ``CMIP6: default`` or ``CMIP5: default``, all the needed files
must be in the same top-level directory specified under ``rootpath``.
However, it is not recommended to use this, as it makes it impossible for
the tool to read the facets from the directory tree.
Moreover, this way of organizing data makes it impossible to store multiple
versions of the same file because the files typically have the same name
for different versions.

.. _config-user-rootpath:

Expand All @@ -668,27 +662,37 @@ Explaining ``config-user/rootpath:``
``rootpath`` identifies the root directory for different data types (``ROOT`` as we used it above):

* ``CMIP`` e.g. ``CMIP5`` or ``CMIP6``: this is the `root` path(s) to where the
CMIP files are stored; it can be a single path or a list of paths; it can
CMIP files are stored; it can be a single path, a list of paths, or a mapping
with paths as keys and `drs` names as values; it can
point to an ESGF node or it can point to a user private repository. Example
for a CMIP5 root path pointing to the ESGF node on CEDA-Jasmin (formerly
for a CMIP5 root path pointing to the ESGF node mounted on CEDA-Jasmin (formerly
known as BADC):

.. code-block:: yaml
CMIP5: /badc/cmip5/data/cmip5/output1
rootpath:
CMIP5: /badc/cmip5/data/cmip5/output1
Example for a CMIP6 root path pointing to the ESGF node on CEDA-Jasmin:

.. code-block:: yaml
CMIP6: /badc/cmip6/data/CMIP6/CMIP
rootpath:
CMIP6: /badc/cmip6/data/CMIP6
Example for a mix of CMIP6 root path pointing to the ESGF node on CEDA-Jasmin
and a user-specific data repository for extra data:

.. code-block:: yaml
CMIP6: [/badc/cmip6/data/CMIP6/CMIP, /home/users/johndoe/cmip_data]
rootpath:
CMIP6:
/badc/cmip6/data/CMIP6: BADC
~/climate_data: ESGF
Note that this notation combines the ``rootpath`` and ``drs`` settings, so it
is not necessary to specify the directory structure in under ``drs`` in this
case.

* ``OBS``: this is the `root` path(s) to where the observational datasets are
stored; again, this could be a single path or a list of paths, just like for
Expand All @@ -697,17 +701,31 @@ Explaining ``config-user/rootpath:``

.. code-block:: yaml
OBS: /gws/nopw/j04/esmeval/obsdata-v2
rootpath:
OBS: /gws/nopw/j04/esmeval/obsdata-v2
* ``default``: this is the `root` path(s) where the tool will look for data
from projects that do not have their own rootpath set.

* ``RAWOBS``: this is the `root` path(s) to where the raw observational data
files are stored; this is used by ``esmvaltool data format``.

Synda
-----

If the `synda install <https://prodiguer.github.io/synda/sdt/user_guide.html#synda-install>`_ command is used to download data,
it maintains the directory structure as on ESGF. To find data downloaded by
synda, use the ``SYNDA`` ``drs`` parameter.

.. code-block:: yaml
drs:
CMIP6: SYNDA
CMIP5: SYNDA
Dataset definitions in ``recipe``
---------------------------------
Once the correct paths have been established, ESMValTool collects the
Once the correct paths have been established, ESMValCore collects the
information on the specific datasets that are needed for the analysis. This
information, together with the CMOR convention for naming files (see CMOR-DRS_)
will allow the tool to search and find the right files. The specific
Expand Down
14 changes: 7 additions & 7 deletions esmvalcore/cmor/_fixes/icon/_base_fixes.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from iris.experimental.ugrid import Connectivity, Mesh

from esmvalcore.cmor._fixes.native_datasets import NativeDatasetFix
from esmvalcore.local import _get_rootpath, _replace_tags, _select_drs
from esmvalcore.local import _get_data_sources

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -297,12 +297,12 @@ def _get_grid_from_cube_attr(self, cube: Cube) -> Cube:

def _get_grid_from_rootpath(self, grid_name: str) -> CubeList | None:
"""Try to get grid from the ICON rootpath."""
rootpaths = _get_rootpath('ICON')
dirname_template = _select_drs('input_dir', 'ICON')
dirname_globs = _replace_tags(dirname_template, self.extra_facets)
possible_grid_paths = [
r / d / grid_name for r in rootpaths for d in dirname_globs
]
glob_patterns: list[Path] = []
for data_source in _get_data_sources('ICON'):
glob_patterns.extend(
data_source.get_glob_patterns(**self.extra_facets)
)
possible_grid_paths = [d.parent / grid_name for d in glob_patterns]
for grid_path in possible_grid_paths:
if grid_path.is_file():
logger.debug("Using ICON grid file '%s'", grid_path)
Expand Down
56 changes: 36 additions & 20 deletions esmvalcore/config-user.yml
Original file line number Diff line number Diff line change
Expand Up @@ -109,13 +109,25 @@ drs:
CORDEX: ESGF
obs4MIPs: ESGF

# Example rootpaths and directory structure that showcases the different
# projects and also the use of lists
# Example rootpaths and directory structure names for different projects.
# For each project, the entry can be a single path, a list of paths, or a
# mapping from paths to directory structure names.
# For single paths and list of paths, the directory structure names can be
# defined under 'drs'.
# If no path is defined for a project, the tool will look in the 'default'
# path.
# If no directory structure name is given, the name 'default' will be used.
# Directory structures corresponding to the names are defined in the file
# config-developer.yml.
# For site-specific entries, see below.
#rootpath:
# CMIP3: [~/cmip3_inputpath1, ~/cmip3_inputpath2]
# CMIP5: [~/cmip5_inputpath1, ~/cmip5_inputpath2]
# CMIP6: [~/cmip6_inputpath1, ~/cmip6_inputpath2]
# CMIP6:
# /path/to/data: DKRZ
# ~/path/to/more/data: ESGF
# CMIP5:
# - ~/cmip5_inputpath1
# - ~/cmip5_inputpath2
# CMIP3: ~/cmip6_inputpath
# OBS: ~/obs_inputpath
# OBS6: ~/obs6_inputpath
# obs4MIPs: ~/obs4mips_inputpath
Expand All @@ -124,11 +136,10 @@ drs:
# RAWOBS: ~/rawobs_inputpath
# default: ~/default_inputpath
#drs:
# CMIP3: default
# CMIP5: default
# CMIP6: default
# CORDEX: default
# obs4MIPs: default
# CMIP3: ESGF
# CMIP5: ESGF
# CORDEX: ESGF
# obs4MIPs: ESGF

# Directory tree created by automatically downloading from ESGF
# Uncomment the lines below to locate data that has been automatically
Expand Down Expand Up @@ -175,22 +186,27 @@ drs:
# Uncomment the lines below to locate data on Levante at DKRZ.
#auxiliary_data_dir: /work/bd0854/DATA/ESMValTool2/AUX
#rootpath:
# CMIP6: /work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ
# CMIP5: /work/bd0854/DATA/ESMValTool2/CMIP5_DKRZ
# CMIP3: /work/bd0854/DATA/ESMValTool2/CMIP3
# CORDEX: /work/ik1017/C3SCORDEX/data/c3s-cordex/output
# CMIP6:
# /work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ: DKRZ
# /work/bd0854/DATA/ESMValTool2/download: ESGF
# CMIP5:
# /work/bd0854/DATA/ESMValTool2/CMIP5_DKRZ: DKRZ
# /work/bd0854/DATA/ESMValTool2/download: ESGF
# CMIP3:
# /work/bd0854/DATA/ESMValTool2/CMIP3: DKRZ
# /work/bd0854/DATA/ESMValTool2/download: ESGF
# CORDEX:
# /work/ik1017/C3SCORDEX/data/c3s-cordex/output: BADC
# /work/bd0854/DATA/ESMValTool2/download: ESGF
# OBS: /work/bd0854/DATA/ESMValTool2/OBS
# OBS6: /work/bd0854/DATA/ESMValTool2/OBS
# obs4MIPs: /work/bd0854/DATA/ESMValTool2/OBS
# obs4MIPs:
# /work/bd0854/DATA/ESMValTool2/OBS: default
# /work/bd0854/DATA/ESMValTool2/download: ESGF
# ana4mips: /work/bd0854/DATA/ESMValTool2/OBS
# native6: /work/bd0854/DATA/ESMValTool2/RAWOBS
# RAWOBS: /work/bd0854/DATA/ESMValTool2/RAWOBS
#drs:
# CMIP6: DKRZ
# CMIP5: DKRZ
# CMIP3: DKRZ
# CORDEX: BADC
# obs4MIPs: default
# ana4mips: default
# OBS: default
# OBS6: default
Expand Down
Loading

0 comments on commit d8b4d4d

Please sign in to comment.