Merge remote-tracking branch 'upstream/main' into numpydocs1

* upstream/main: Mergeback of `FEATURE_chunk_control` branch (SciTools#5588) [CI Bot] environment lockfiles auto-update (SciTools#5547) Mergeback of "Feature _split_attrs" branch (SciTools#5152) add whatsnew (SciTools#5596) Refactor area weighted regridding, improve performance (SciTools#5543) Allowing exemption to axis guessing on coords (SciTools#5551)
tkknight · Nov 27, 2023 · 39f78fa · 39f78fa
2 parents 7d710b5 + 507c34c
commit 39f78fa
Show file tree

Hide file tree

Showing 41 changed files with 8,440 additions and 1,539 deletions.
diff --git a/docs/src/further_topics/metadata.rst b/docs/src/further_topics/metadata.rst
@@ -91,6 +91,16 @@ actual `data attribute`_ names of the metadata members on the Iris class.
    metadata members are Iris specific terms, rather than recognised `CF Conventions`_
    terms.
 
+.. note::
+
+    :class:`~iris.cube.Cube` :attr:`~iris.cube.Cube.attributes` implement the
+    concept of dataset-level and variable-level attributes, to enable correct
+    NetCDF loading and saving (see :class:`~iris.cube.CubeAttrsDict` and NetCDF
+    :func:`~iris.fileformats.netcdf.saver.save` for more). ``attributes`` on
+    the other classes do not have this distinction, but the ``attributes``
+    members of ALL the classes still have the same interface, and can be
+    compared.
+
 
 Common Metadata API
 ===================
@@ -128,10 +138,12 @@ For example, given the following :class:`~iris.cube.Cube`,
             source                      'Data from Met Office Unified Model 6.05'
 
 We can easily get all of the associated metadata of the :class:`~iris.cube.Cube`
-using the ``metadata`` property:
+using the ``metadata`` property (note the specialised
+:class:`~iris.cube.CubeAttrsDict` for the :attr:`~iris.cube.Cube.attributes`,
+as mentioned earlier):
 
     >>> cube.metadata
-    CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
+    CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes=CubeAttrsDict(globals={'Conventions': 'CF-1.5'}, locals={'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}), cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
 
 We can also inspect the ``metadata`` of the ``longitude``
 :class:`~iris.coords.DimCoord` attached to the :class:`~iris.cube.Cube` in the same way:
@@ -675,8 +687,8 @@ For example, consider the following :class:`~iris.common.metadata.CubeMetadata`,
 
 .. doctest:: metadata-combine
 
-    >>> cube.metadata  # doctest: +SKIP
-    CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
+    >>> cube.metadata
+    CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes=CubeAttrsDict(globals={'Conventions': 'CF-1.5'}, locals={'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}), cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
 
 We can perform the **identity function** by comparing the metadata with itself,
 
@@ -701,7 +713,7 @@ which is replaced with a **different value**,
     >>> metadata != cube.metadata
     True
     >>> metadata.combine(cube.metadata)  # doctest: +SKIP
-    CubeMetadata(standard_name=None, long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'STASH': STASH(model=1, section=3, item=236), 'source': 'Data from Met Office Unified Model 6.05', 'Model scenario': 'A1B', 'Conventions': 'CF-1.5'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
+    CubeMetadata(standard_name=None, long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05', 'Conventions': 'CF-1.5'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
 
 The ``combine`` method combines metadata by performing a **strict** comparison
 between each of the associated metadata member values,
@@ -724,7 +736,7 @@ Let's reinforce this behaviour, but this time by combining metadata where the
     >>> metadata != cube.metadata
     True
     >>> metadata.combine(cube.metadata).attributes
-    {'Model scenario': 'A1B'}
+    CubeAttrsDict(globals={}, locals={'Model scenario': 'A1B'})
 
 The combined result for the ``attributes`` member only contains those
 **common keys** with **common values**.
@@ -810,16 +822,17 @@ the ``from_metadata`` class method. For example, given the following
 
 .. doctest:: metadata-convert
 
-    >>> cube.metadata  # doctest: +SKIP
-    CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
+    >>> cube.metadata
+    CubeMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes=CubeAttrsDict(globals={'Conventions': 'CF-1.5'}, locals={'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}), cell_methods=(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),))
 
 We can easily convert it to a :class:`~iris.common.metadata.DimCoordMetadata` instance
 using ``from_metadata``,
 
 .. doctest:: metadata-convert
 
-    >>> DimCoordMetadata.from_metadata(cube.metadata)  # doctest: +SKIP
-    DimCoordMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, coord_system=None, climatological=None, circular=None)
+    >>> newmeta = DimCoordMetadata.from_metadata(cube.metadata)
+    >>> print(newmeta)
+    DimCoordMetadata(standard_name=air_temperature, var_name=air_temperature, units=K, attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'})
 
 By examining :numref:`metadata members table`, we can see that the
 :class:`~iris.cube.Cube` and :class:`~iris.coords.DimCoord` container
@@ -849,9 +862,9 @@ class instance,
 
 .. doctest:: metadata-convert
 
-    >>> longitude.metadata.from_metadata(cube.metadata)
-    DimCoordMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, coord_system=None, climatological=None, circular=None)
-
+   >>> newmeta = longitude.metadata.from_metadata(cube.metadata)
+   >>> print(newmeta)
+   DimCoordMetadata(standard_name=air_temperature, var_name=air_temperature, units=K, attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'})
 
 .. _metadata assignment:
 
@@ -978,7 +991,7 @@ Indeed, it's also possible to assign to the ``metadata`` property with a
     >>> longitude.metadata
     DimCoordMetadata(standard_name='longitude', long_name=None, var_name='longitude', units=Unit('degrees'), attributes={}, coord_system=GeogCS(6371229.0), climatological=False, circular=False)
     >>> longitude.metadata = cube.metadata
-    >>> longitude.metadata  # doctest: +SKIP
+    >>> longitude.metadata
     DimCoordMetadata(standard_name='air_temperature', long_name=None, var_name='air_temperature', units=Unit('K'), attributes={'Conventions': 'CF-1.5', 'STASH': STASH(model=1, section=3, item=236), 'Model scenario': 'A1B', 'source': 'Data from Met Office Unified Model 6.05'}, coord_system=GeogCS(6371229.0), climatological=False, circular=False)
 
 Note that, only **common** metadata members will be assigned new associated

diff --git a/docs/src/techpapers/index.rst b/docs/src/techpapers/index.rst
@@ -11,3 +11,4 @@ Extra information on specific technical issues.
 
    um_files_loading.rst
    missing_data_handling.rst
+   netcdf_io.rst
diff --git a/docs/src/techpapers/netcdf_io.rst b/docs/src/techpapers/netcdf_io.rst
@@ -0,0 +1,140 @@
+.. testsetup:: chunk_control
+
+    import iris
+    from iris.fileformats.netcdf.loader import CHUNK_CONTROL
+
+    from pathlib import Path
+    import dask
+    import shutil
+    import tempfile
+
+    tmp_dir = Path(tempfile.mkdtemp())
+    tmp_filepath = tmp_dir / "tmp.nc"
+
+    cube = iris.load(iris.sample_data_path("E1_north_america.nc"))[0]
+    iris.save(cube, tmp_filepath, chunksizes=(120, 37, 49))
+    old_dask = dask.config.get("array.chunk-size")
+    dask.config.set({'array.chunk-size': '500KiB'})
+
+
+.. testcleanup:: chunk_control
+
+    dask.config.set({'array.chunk-size': old_dask})
+    shutil.rmtree(tmp_dir)
+
+.. _netcdf_io:
+
+=============================
+NetCDF I/O Handling in Iris
+=============================
+
+This document provides a basic account of how Iris loads and saves NetCDF files.
+
+.. admonition:: Under Construction
+
+    This document is still a work in progress, so might include blank or unfinished sections,
+    watch this space!
+
+
+Chunk Control
+--------------
+
+Default Chunking
+^^^^^^^^^^^^^^^^
+
+Chunks are, by default, optimised by Iris on load. This will automatically
+decide the best chunksize for your data without any user input. This is
+calculated based on a number of factors, including:
+
+- File Variable Chunking
+- Full Variable Shape
+- Dask Default Chunksize
+- Dimension Order: Earlier (outer) dimensions will be prioritised to be split over later (inner) dimensions.
+
+.. doctest:: chunk_control
+
+    >>> cube = iris.load_cube(tmp_filepath)
+    >>>
+    >>> print(cube.shape)
+    (240, 37, 49)
+    >>> print(cube.core_data().chunksize)
+    (60, 37, 49)
+
+For more user control, functionality was updated in :pull:`5588`, with the
+creation of the :data:`iris.fileformats.netcdf.loader.CHUNK_CONTROL` class.
+
+Custom Chunking: Set
+^^^^^^^^^^^^^^^^^^^^
+
+There are three context manangers within :data:`~iris.fileformats.netcdf.loader.CHUNK_CONTROL`. The most basic is
+:meth:`~iris.fileformats.netcdf.loader.ChunkControl.set`. This allows you to specify the chunksize for each dimension,
+and to specify a ``var_name`` specifically to change.
+
+Using ``-1`` in place of a chunksize will ensure the chunksize stays the same
+as the shape, i.e. no optimisation occurs on that dimension.
+
+.. doctest:: chunk_control
+
+    >>> with CHUNK_CONTROL.set("air_temperature", time=180, latitude=-1, longitude=25):
+    ...     cube = iris.load_cube(tmp_filepath)
+    >>>
+    >>> print(cube.core_data().chunksize)
+    (180, 37, 25)
+
+Note that ``var_name`` is optional, and that you don't need to specify every dimension. If you
+specify only one dimension, the rest will be optimised using Iris' default behaviour.
+
+.. doctest:: chunk_control
+
+    >>> with CHUNK_CONTROL.set(longitude=25):
+    ...     cube = iris.load_cube(tmp_filepath)
+    >>>
+    >>> print(cube.core_data().chunksize)
+    (120, 37, 25)
+
+Custom Chunking: From File
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The second context manager is :meth:`~iris.fileformats.netcdf.loader.ChunkControl.from_file`.
+This takes chunksizes as defined in the NetCDF file. Any dimensions without specified chunks
+will default to Iris optimisation.
+
+.. doctest:: chunk_control
+
+    >>> with CHUNK_CONTROL.from_file():
+    ...     cube = iris.load_cube(tmp_filepath)
+    >>>
+    >>> print(cube.core_data().chunksize)
+    (120, 37, 49)
+
+Custom Chunking: As Dask
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The final context manager, :meth:`~iris.fileformats.netcdf.loader.ChunkControl.as_dask`, bypasses
+Iris' optimisation all together, and will take its chunksizes from Dask's behaviour.
+
+.. doctest:: chunk_control
+
+    >>> with CHUNK_CONTROL.as_dask():
+    ...    cube = iris.load_cube(tmp_filepath)
+    >>>
+    >>> print(cube.core_data().chunksize)
+    (70, 37, 49)
+
+
+Split Attributes
+-----------------
+
+TBC
+
+
+Deferred Saving
+----------------
+
+TBC
+
+
+Guess Axis
+-----------
+
+TBC
diff --git a/docs/src/userguide/iris_cubes.rst b/docs/src/userguide/iris_cubes.rst
@@ -85,7 +85,10 @@ A cube consists of:
     data dimensions as the coordinate has dimensions.
 
 * an attributes dictionary which, other than some protected CF names, can
-  hold arbitrary extra metadata.
+  hold arbitrary extra metadata. This implements the concept of dataset-level
+  and variable-level attributes when loading and and saving NetCDF files (see
+  :class:`~iris.cube.CubeAttrsDict` and NetCDF
+  :func:`~iris.fileformats.netcdf.saver.save` for more).
 * a list of cell methods to represent operations which have already been
   applied to the data (e.g. "mean over time")
 * a list of coordinate "factories" used for deriving coordinates from the

diff --git a/docs/src/whatsnew/latest.rst b/docs/src/whatsnew/latest.rst
@@ -29,7 +29,17 @@ This document explains the changes made to Iris for this release
 
 ✨ Features
 ===========
-
+#. `@pp-mo`_, `@lbdreyer`_ and `@trexfeathers`_ improved
+   :class:`~iris.cube.Cube` :attr:`~iris.cube.Cube.attributes` handling to
+   better preserve the distinction between dataset-level and variable-level
+   attributes, allowing file-Cube-file round-tripping of NetCDF attributes. See
+   :class:`~iris.cube.CubeAttrsDict`, NetCDF
+   :func:`~iris.fileformats.netcdf.saver.save` and :data:`~iris.Future` for more.
+   (:pull:`5152`, `split attributes project`_)
+
+#. `@rcomer`_ rewrote :func:`~iris.util.broadcast_to_shape` so it now handles
+   lazy data. (:pull:`5307`)
+
 #. `@trexfeathers`_ and `@HGWright`_ (reviewer) sub-categorised all Iris'
    :class:`UserWarning`\s for richer filtering. The full index of
    sub-categories can be seen here: :mod:`iris.exceptions` . (:pull:`5498`)
@@ -44,6 +54,14 @@ This document explains the changes made to Iris for this release
    Winter - December to February) will be assigned to the preceding year (e.g.
    the year of December) instead of the following year (the default behaviour).
    (:pull:`5573`)
+
+#. `@HGWright`_ added :attr:`~iris.coords.Coord.ignore_axis` to allow manual
+   intervention preventing :func:`~iris.util.guess_coord_axis` from acting on a
+   coordinate. (:pull:`5551`)
+
+#. `@pp-mo`_, `@trexfeathers`_ and `@ESadek-MO`_ added more control over
+   NetCDF chunking with the use of the :data:`iris.fileformats.netcdf.loader.CHUNK_CONTROL`
+   context manager. (:pull:`5588`)
 
 
 🐛 Bugs Fixed
@@ -68,7 +86,8 @@ This document explains the changes made to Iris for this release
 🚀 Performance Enhancements
 ===========================
 
-#. N/A
+#. `@stephenworsley`_ improved the speed of :class:`~iris.analysis.AreaWeighted`
+   regridding. (:pull:`5543`)
 
 
 🔥 Deprecations
@@ -103,6 +122,10 @@ This document explains the changes made to Iris for this release
 #. `@ESadek-MO`_ added a phrasebook for synonymous terms used in similar
    packages. (:pull:`5564`)
 
+#. `@ESadek-MO`_ and `@trexfeathers`_ created a technical paper for NetCDF
+   saving and loading, :ref:`netcdf_io` with a section on chunking, and placeholders
+   for further topics. (:pull:`5588`)
+
 
 💼 Internal
 ===========
@@ -147,4 +170,4 @@ This document explains the changes made to Iris for this release
 
 .. _NEP29 Drop Schedule: https://numpy.org/neps/nep-0029-deprecation_policy.html#drop-schedule
 .. _codespell: https://github.com/codespell-project/codespell
-
+.. _split attributes project: https://github.com/orgs/SciTools/projects/5?pane=info
diff --git a/lib/iris/__init__.py b/lib/iris/__init__.py
@@ -141,7 +141,9 @@ def callback(cube, field, filename):
 class Future(threading.local):
     """Run-time configuration controller."""
 
-    def __init__(self, datum_support=False, pandas_ndim=False):
+    def __init__(
+        self, datum_support=False, pandas_ndim=False, save_split_attrs=False
+    ):
         """
         A container for run-time options controls.
 
@@ -163,6 +165,11 @@ def __init__(self, datum_support=False, pandas_ndim=False):
         pandas_ndim : bool, default=False
             See :func:`iris.pandas.as_data_frame` for details - opts in to the
             newer n-dimensional behaviour.
+        save_split_attrs : bool, default=False
+            Save "global" and "local" cube attributes to netcdf in appropriately
+            different ways :  "global" ones are saved as dataset attributes, where
+            possible, while "local" ones are saved as data-variable attributes.
+            See :func:`iris.fileformats.netcdf.saver.save`.
 
         """
         # The flag 'example_future_flag' is provided as a reference for the
@@ -174,14 +181,18 @@ def __init__(self, datum_support=False, pandas_ndim=False):
         # self.__dict__['example_future_flag'] = example_future_flag
         self.__dict__["datum_support"] = datum_support
         self.__dict__["pandas_ndim"] = pandas_ndim
+        self.__dict__["save_split_attrs"] = save_split_attrs
+
         # TODO: next major release: set IrisDeprecation to subclass
         #  DeprecationWarning instead of UserWarning.
 
     def __repr__(self):
         # msg = ('Future(example_future_flag={})')
         # return msg.format(self.example_future_flag)
-        msg = "Future(datum_support={}, pandas_ndim={})"
-        return msg.format(self.datum_support, self.pandas_ndim)
+        msg = "Future(datum_support={}, pandas_ndim={}, save_split_attrs={})"
+        return msg.format(
+            self.datum_support, self.pandas_ndim, self.save_split_attrs
+        )
 
     # deprecated_options = {'example_future_flag': 'warning',}
     deprecated_options = {}