Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main' into numpydocs1
Browse files Browse the repository at this point in the history
* upstream/main:
  Mergeback of `FEATURE_chunk_control` branch (SciTools#5588)
  [CI Bot] environment lockfiles auto-update (SciTools#5547)
  Mergeback of "Feature _split_attrs" branch (SciTools#5152)
  add whatsnew (SciTools#5596)
  Refactor area weighted regridding, improve performance (SciTools#5543)
  Allowing exemption to axis guessing on coords (SciTools#5551)
  • Loading branch information
tkknight committed Nov 27, 2023
2 parents 7d710b5 + 507c34c commit 39f78fa
Show file tree
Hide file tree
Showing 41 changed files with 8,440 additions and 1,539 deletions.
41 changes: 27 additions & 14 deletions docs/src/further_topics/metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,16 @@ actual `data attribute`_ names of the metadata members on the Iris class.
metadata members are Iris specific terms, rather than recognised `CF Conventions`_
terms.

.. note::

:class:`~iris.cube.Cube` :attr:`~iris.cube.Cube.attributes` implement the
concept of dataset-level and variable-level attributes, to enable correct
NetCDF loading and saving (see :class:`~iris.cube.CubeAttrsDict` and NetCDF
:func:`~iris.fileformats.netcdf.saver.save` for more). ``attributes`` on
the other classes do not have this distinction, but the ``attributes``
members of ALL the classes still have the same interface, and can be
compared.


Common Metadata API
===================
Expand Down Expand Up @@ -128,10 +138,12 @@ For example, given the following :class:`~iris.cube.Cube`,
source 'Data from Met Office Unified Model 6.05'

We can easily get all of the associated metadata of the :class:`~iris.cube.Cube`
using the ``metadata`` property:
using the ``metadata`` property (note the specialised
:class:`~iris.cube.CubeAttrsDict` for the :attr:`~iris.cube.Cube.attributes`,
as mentioned earlier):

>>> cube.metadata
CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes=CubeAttrsDict(globals={'Conventions': 'CF-1.5'}, locals={'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}), cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))

We can also inspect the ``metadata`` of the ``longitude``
:class:`~iris.coords.DimCoord` attached to the :class:`~iris.cube.Cube` in the same way:
Expand Down Expand Up @@ -675,8 +687,8 @@ For example, consider the following :class:`~iris.common.metadata.CubeMetadata`,

.. doctest:: metadata-combine

>>> cube.metadata # doctest: +SKIP
CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
>>> cube.metadata
CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes=CubeAttrsDict(globals={'Conventions': 'CF-1.5'}, locals={'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}), cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))

We can perform the **identity function** by comparing the metadata with itself,

Expand All @@ -701,7 +713,7 @@ which is replaced with a **different value**,
>>> metadata != cube.metadata
True
>>> metadata.combine(cube.metadata) # doctest: +SKIP
CubeMetadata(standard_name=None, long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'STASH': STASH(model=1, section=3, item=236), 'source': 'Data from Met Office Unified Model 6.05', 'Model scenario': 'A1B', 'Conventions': 'CF-1.5'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
CubeMetadata(standard_name=None, long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05', 'Conventions': 'CF-1.5'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))

The ``combine`` method combines metadata by performing a **strict** comparison
between each of the associated metadata member values,
Expand All @@ -724,7 +736,7 @@ Let's reinforce this behaviour, but this time by combining metadata where the
>>> metadata != cube.metadata
True
>>> metadata.combine(cube.metadata).attributes
{'Model scenario': 'A1B'}
CubeAttrsDict(globals={}, locals={'Model scenario': 'A1B'})

The combined result for the ``attributes`` member only contains those
**common keys** with **common values**.
Expand Down Expand Up @@ -810,16 +822,17 @@ the ``from_metadata`` class method. For example, given the following

.. doctest:: metadata-convert

>>> cube.metadata # doctest: +SKIP
CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
>>> cube.metadata
CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes=CubeAttrsDict(globals={'Conventions': 'CF-1.5'}, locals={'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}), cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))

We can easily convert it to a :class:`~iris.common.metadata.DimCoordMetadata` instance
using ``from_metadata``,

.. doctest:: metadata-convert

>>> DimCoordMetadata.from_metadata(cube.metadata) # doctest: +SKIP
DimCoordMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, coord_system=None, climatological=None, circular=None)
>>> newmeta = DimCoordMetadata.from_metadata(cube.metadata)
>>> print(newmeta)
DimCoordMetadata(standard_name=air_temperature, var_name=air_temperature, units=K, attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'})

By examining :numref:`metadata members table`, we can see that the
:class:`~iris.cube.Cube` and :class:`~iris.coords.DimCoord` container
Expand Down Expand Up @@ -849,9 +862,9 @@ class instance,

.. doctest:: metadata-convert

>>> longitude.metadata.from_metadata(cube.metadata)
DimCoordMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, coord_system=None, climatological=None, circular=None)

>>> newmeta = longitude.metadata.from_metadata(cube.metadata)
>>> print(newmeta)
DimCoordMetadata(standard_name=air_temperature, var_name=air_temperature, units=K, attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'})

.. _metadata assignment:

Expand Down Expand Up @@ -978,7 +991,7 @@ Indeed, it's also possible to assign to the ``metadata`` property with a
>>> longitude.metadata
DimCoordMetadata(standard_name='longitude', long_name=None, var_name='longitude', units=Unit('degrees'), attributes={}, coord_system=GeogCS(6371229.0), climatological=False, circular=False)
>>> longitude.metadata = cube.metadata
>>> longitude.metadata # doctest: +SKIP
>>> longitude.metadata
DimCoordMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, coord_system=GeogCS(6371229.0), climatological=False, circular=False)

Note that, only **common** metadata members will be assigned new associated
Expand Down
1 change: 1 addition & 0 deletions docs/src/techpapers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ Extra information on specific technical issues.

um_files_loading.rst
missing_data_handling.rst
netcdf_io.rst
140 changes: 140 additions & 0 deletions docs/src/techpapers/netcdf_io.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
.. testsetup:: chunk_control

import iris
from iris.fileformats.netcdf.loader import CHUNK_CONTROL

from pathlib import Path
import dask
import shutil
import tempfile

tmp_dir = Path(tempfile.mkdtemp())
tmp_filepath = tmp_dir / "tmp.nc"

cube = iris.load(iris.sample_data_path("E1_north_america.nc"))[0]
iris.save(cube, tmp_filepath, chunksizes=(120, 37, 49))
old_dask = dask.config.get("array.chunk-size")
dask.config.set({'array.chunk-size': '500KiB'})


.. testcleanup:: chunk_control

dask.config.set({'array.chunk-size': old_dask})
shutil.rmtree(tmp_dir)

.. _netcdf_io:

=============================
NetCDF I/O Handling in Iris
=============================

This document provides a basic account of how Iris loads and saves NetCDF files.

.. admonition:: Under Construction

This document is still a work in progress, so might include blank or unfinished sections,
watch this space!


Chunk Control
--------------

Default Chunking
^^^^^^^^^^^^^^^^

Chunks are, by default, optimised by Iris on load. This will automatically
decide the best chunksize for your data without any user input. This is
calculated based on a number of factors, including:

- File Variable Chunking
- Full Variable Shape
- Dask Default Chunksize
- Dimension Order: Earlier (outer) dimensions will be prioritised to be split over later (inner) dimensions.

.. doctest:: chunk_control

>>> cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.shape)
(240, 37, 49)
>>> print(cube.core_data().chunksize)
(60, 37, 49)

For more user control, functionality was updated in :pull:`5588`, with the
creation of the :data:`iris.fileformats.netcdf.loader.CHUNK_CONTROL` class.

Custom Chunking: Set
^^^^^^^^^^^^^^^^^^^^

There are three context manangers within :data:`~iris.fileformats.netcdf.loader.CHUNK_CONTROL`. The most basic is
:meth:`~iris.fileformats.netcdf.loader.ChunkControl.set`. This allows you to specify the chunksize for each dimension,
and to specify a ``var_name`` specifically to change.

Using ``-1`` in place of a chunksize will ensure the chunksize stays the same
as the shape, i.e. no optimisation occurs on that dimension.

.. doctest:: chunk_control

>>> with CHUNK_CONTROL.set("air_temperature", time=180, latitude=-1, longitude=25):
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(180, 37, 25)

Note that ``var_name`` is optional, and that you don't need to specify every dimension. If you
specify only one dimension, the rest will be optimised using Iris' default behaviour.

.. doctest:: chunk_control

>>> with CHUNK_CONTROL.set(longitude=25):
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(120, 37, 25)

Custom Chunking: From File
^^^^^^^^^^^^^^^^^^^^^^^^^^

The second context manager is :meth:`~iris.fileformats.netcdf.loader.ChunkControl.from_file`.
This takes chunksizes as defined in the NetCDF file. Any dimensions without specified chunks
will default to Iris optimisation.

.. doctest:: chunk_control

>>> with CHUNK_CONTROL.from_file():
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(120, 37, 49)

Custom Chunking: As Dask
^^^^^^^^^^^^^^^^^^^^^^^^

The final context manager, :meth:`~iris.fileformats.netcdf.loader.ChunkControl.as_dask`, bypasses
Iris' optimisation all together, and will take its chunksizes from Dask's behaviour.

.. doctest:: chunk_control

>>> with CHUNK_CONTROL.as_dask():
... cube = iris.load_cube(tmp_filepath)
>>>
>>> print(cube.core_data().chunksize)
(70, 37, 49)


Split Attributes
-----------------

TBC


Deferred Saving
----------------

TBC


Guess Axis
-----------

TBC
5 changes: 4 additions & 1 deletion docs/src/userguide/iris_cubes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,10 @@ A cube consists of:
data dimensions as the coordinate has dimensions.

* an attributes dictionary which, other than some protected CF names, can
hold arbitrary extra metadata.
hold arbitrary extra metadata. This implements the concept of dataset-level
and variable-level attributes when loading and and saving NetCDF files (see
:class:`~iris.cube.CubeAttrsDict` and NetCDF
:func:`~iris.fileformats.netcdf.saver.save` for more).
* a list of cell methods to represent operations which have already been
applied to the data (e.g. "mean over time")
* a list of coordinate "factories" used for deriving coordinates from the
Expand Down
29 changes: 26 additions & 3 deletions docs/src/whatsnew/latest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,17 @@ This document explains the changes made to Iris for this release

✨ Features
===========

#. `@pp-mo`_, `@lbdreyer`_ and `@trexfeathers`_ improved
:class:`~iris.cube.Cube` :attr:`~iris.cube.Cube.attributes` handling to
better preserve the distinction between dataset-level and variable-level
attributes, allowing file-Cube-file round-tripping of NetCDF attributes. See
:class:`~iris.cube.CubeAttrsDict`, NetCDF
:func:`~iris.fileformats.netcdf.saver.save` and :data:`~iris.Future` for more.
(:pull:`5152`, `split attributes project`_)

#. `@rcomer`_ rewrote :func:`~iris.util.broadcast_to_shape` so it now handles
lazy data. (:pull:`5307`)

#. `@trexfeathers`_ and `@HGWright`_ (reviewer) sub-categorised all Iris'
:class:`UserWarning`\s for richer filtering. The full index of
sub-categories can be seen here: :mod:`iris.exceptions` . (:pull:`5498`)
Expand All @@ -44,6 +54,14 @@ This document explains the changes made to Iris for this release
Winter - December to February) will be assigned to the preceding year (e.g.
the year of December) instead of the following year (the default behaviour).
(:pull:`5573`)

#. `@HGWright`_ added :attr:`~iris.coords.Coord.ignore_axis` to allow manual
intervention preventing :func:`~iris.util.guess_coord_axis` from acting on a
coordinate. (:pull:`5551`)

#. `@pp-mo`_, `@trexfeathers`_ and `@ESadek-MO`_ added more control over
NetCDF chunking with the use of the :data:`iris.fileformats.netcdf.loader.CHUNK_CONTROL`
context manager. (:pull:`5588`)


🐛 Bugs Fixed
Expand All @@ -68,7 +86,8 @@ This document explains the changes made to Iris for this release
🚀 Performance Enhancements
===========================

#. N/A
#. `@stephenworsley`_ improved the speed of :class:`~iris.analysis.AreaWeighted`
regridding. (:pull:`5543`)


🔥 Deprecations
Expand Down Expand Up @@ -103,6 +122,10 @@ This document explains the changes made to Iris for this release
#. `@ESadek-MO`_ added a phrasebook for synonymous terms used in similar
packages. (:pull:`5564`)

#. `@ESadek-MO`_ and `@trexfeathers`_ created a technical paper for NetCDF
saving and loading, :ref:`netcdf_io` with a section on chunking, and placeholders
for further topics. (:pull:`5588`)


💼 Internal
===========
Expand Down Expand Up @@ -147,4 +170,4 @@ This document explains the changes made to Iris for this release
.. _NEP29 Drop Schedule: https://numpy.org/neps/nep-0029-deprecation_policy.html#drop-schedule
.. _codespell: https://github.com/codespell-project/codespell

.. _split attributes project: https://github.com/orgs/SciTools/projects/5?pane=info
17 changes: 14 additions & 3 deletions lib/iris/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,9 @@ def callback(cube, field, filename):
class Future(threading.local):
"""Run-time configuration controller."""

def __init__(self, datum_support=False, pandas_ndim=False):
def __init__(
self, datum_support=False, pandas_ndim=False, save_split_attrs=False
):
"""
A container for run-time options controls.
Expand All @@ -163,6 +165,11 @@ def __init__(self, datum_support=False, pandas_ndim=False):
pandas_ndim : bool, default=False
See :func:`iris.pandas.as_data_frame` for details - opts in to the
newer n-dimensional behaviour.
save_split_attrs : bool, default=False
Save "global" and "local" cube attributes to netcdf in appropriately
different ways : "global" ones are saved as dataset attributes, where
possible, while "local" ones are saved as data-variable attributes.
See :func:`iris.fileformats.netcdf.saver.save`.
"""
# The flag 'example_future_flag' is provided as a reference for the
Expand All @@ -174,14 +181,18 @@ def __init__(self, datum_support=False, pandas_ndim=False):
# self.__dict__['example_future_flag'] = example_future_flag
self.__dict__["datum_support"] = datum_support
self.__dict__["pandas_ndim"] = pandas_ndim
self.__dict__["save_split_attrs"] = save_split_attrs

# TODO: next major release: set IrisDeprecation to subclass
# DeprecationWarning instead of UserWarning.

def __repr__(self):
# msg = ('Future(example_future_flag={})')
# return msg.format(self.example_future_flag)
msg = "Future(datum_support={}, pandas_ndim={})"
return msg.format(self.datum_support, self.pandas_ndim)
msg = "Future(datum_support={}, pandas_ndim={}, save_split_attrs={})"
return msg.format(
self.datum_support, self.pandas_ndim, self.save_split_attrs
)

# deprecated_options = {'example_future_flag': 'warning',}
deprecated_options = {}
Expand Down
Loading

0 comments on commit 39f78fa

Please sign in to comment.