Skip to content
forked from pydata/xarray

Commit

Permalink
Merge branch 'main' into state-machine
Browse files Browse the repository at this point in the history
* main: (26 commits)
  [pre-commit.ci] pre-commit autoupdate (pydata#8900)
  Bump the actions group with 1 update (pydata#8896)
  New empty whatsnew entry (pydata#8899)
  Update reference to 'Weighted quantile estimators' (pydata#8898)
  2024.03.0: Add whats-new (pydata#8891)
  Add typing to test_groupby.py (pydata#8890)
  Avoid in-place multiplication of a large value to an array with small integer dtype (pydata#8867)
  Check for aligned chunks when writing to existing variables (pydata#8459)
  Add dt.date to plottable types (pydata#8873)
  Optimize writes to existing Zarr stores. (pydata#8875)
  Allow multidimensional variable with same name as dim when constructing dataset via coords (pydata#8886)
  Don't allow overwriting indexes with region writes (pydata#8877)
  Migrate datatree.py module into xarray.core. (pydata#8789)
  warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend (pydata#8874)
  groupby: Dispatch quantile to flox. (pydata#8720)
  Opt out of auto creating index variables (pydata#8711)
  Update docs on view / copies (pydata#8744)
  Handle .oindex and .vindex for the PandasMultiIndexingAdapter and PandasIndexingAdapter (pydata#8869)
  numpy 2.0 copy-keyword and trapz vs trapezoid (pydata#8865)
  upstream-dev CI: Fix interp and cumtrapz (pydata#8861)
  ...
  • Loading branch information
dcherian committed Apr 2, 2024
2 parents a216531 + 97d3a3a commit 4399d96
Show file tree
Hide file tree
Showing 60 changed files with 1,341 additions and 881 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/ci-additional.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ jobs:
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report xarray/
- name: Upload mypy coverage to Codecov
uses: codecov/codecov-action@v4.1.0
uses: codecov/codecov-action@v4.1.1
with:
file: mypy_report/cobertura.xml
flags: mypy
Expand Down Expand Up @@ -181,7 +181,7 @@ jobs:
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report xarray/
- name: Upload mypy coverage to Codecov
uses: codecov/codecov-action@v4.1.0
uses: codecov/codecov-action@v4.1.1
with:
file: mypy_report/cobertura.xml
flags: mypy39
Expand Down Expand Up @@ -242,7 +242,7 @@ jobs:
python -m pyright xarray/
- name: Upload pyright coverage to Codecov
uses: codecov/codecov-action@v4.1.0
uses: codecov/codecov-action@v4.1.1
with:
file: pyright_report/cobertura.xml
flags: pyright
Expand Down Expand Up @@ -301,7 +301,7 @@ jobs:
python -m pyright xarray/
- name: Upload pyright coverage to Codecov
uses: codecov/codecov-action@v4.1.0
uses: codecov/codecov-action@v4.1.1
with:
file: pyright_report/cobertura.xml
flags: pyright39
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ jobs:
path: pytest.xml

- name: Upload code coverage to Codecov
uses: codecov/codecov-action@v4.1.0
uses: codecov/codecov-action@v4.1.1
with:
file: ./coverage.xml
flags: unittests
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/upstream-dev-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ jobs:
run: |
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report
- name: Upload mypy coverage to Codecov
uses: codecov/codecov-action@v4.1.0
uses: codecov/codecov-action@v4.1.1
with:
file: mypy_report/cobertura.xml
flags: mypy
Expand Down
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,24 +13,24 @@ repos:
- id: mixed-line-ending
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: 'v0.2.0'
rev: 'v0.3.4'
hooks:
- id: ruff
args: ["--fix", "--show-fixes"]
# https://github.com/python/black#version-control-integration
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.1.1
rev: 24.3.0
hooks:
- id: black-jupyter
- repo: https://github.com/keewis/blackdoc
rev: v0.3.9
hooks:
- id: blackdoc
exclude: "generate_aggregations.py"
additional_dependencies: ["black==24.1.1"]
additional_dependencies: ["black==24.3.0"]
- id: blackdoc-autoupdate-black
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.8.0
rev: v1.9.0
hooks:
- id: mypy
# Copied from setup.cfg
Expand Down
6 changes: 4 additions & 2 deletions doc/user-guide/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -748,7 +748,7 @@ Whether array indexing returns a view or a copy of the underlying
data depends on the nature of the labels.

For positional (integer)
indexing, xarray follows the same rules as NumPy:
indexing, xarray follows the same `rules`_ as NumPy:

* Positional indexing with only integers and slices returns a view.
* Positional indexing with arrays or lists returns a copy.
Expand All @@ -765,8 +765,10 @@ Whether data is a copy or a view is more predictable in xarray than in pandas, s
unlike pandas, xarray does not produce `SettingWithCopy warnings`_. However, you
should still avoid assignment with chained indexing.

.. _SettingWithCopy warnings: https://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy
Note that other operations (such as :py:meth:`~xarray.DataArray.values`) may also return views rather than copies.

.. _SettingWithCopy warnings: https://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy
.. _rules: https://numpy.org/doc/stable/user/basics.copies.html

.. _multi-level indexing:

Expand Down
70 changes: 58 additions & 12 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,32 +15,65 @@ What's New
np.random.seed(123456)
.. _whats-new.2024.03.0:
.. _whats-new.2024.04.0:

v2024.03.0 (unreleased)
v2024.04.0 (unreleased)
-----------------------

New Features
~~~~~~~~~~~~


Breaking changes
~~~~~~~~~~~~~~~~


Bug fixes
~~~~~~~~~


Internal Changes
~~~~~~~~~~~~~~~~


.. _whats-new.2024.03.0:

v2024.03.0 (Mar 29, 2024)
-------------------------

This release brings performance improvements for grouped and resampled quantile calculations, CF decoding improvements,
minor optimizations to distributed Zarr writes, and compatibility fixes for Numpy 2.0 and Pandas 3.0.

Thanks to the 18 contributors to this release:
Anderson Banihirwe, Christoph Hasse, Deepak Cherian, Etienne Schalk, Justus Magin, Kai Mühlbauer, Kevin Schwarzwald, Mark Harfouche, Martin, Matt Savoie, Maximilian Roos, Ray Bell, Roberto Chang, Spencer Clark, Tom Nicholas, crusaderky, owenlittlejohns, saschahofmann

New Features
~~~~~~~~~~~~
- Partial writes to existing chunks with ``region`` or ``append_dim`` will now raise an error
(unless ``safe_chunks=False``); previously an error would only be raised on
new variables. (:pull:`8459`, :issue:`8371`, :issue:`8882`)
By `Maximilian Roos <https://github.com/max-sixty>`_.
- Grouped and resampling quantile calculations now use the vectorized algorithm in ``flox>=0.9.4`` if present.
By `Deepak Cherian <https://github.com/dcherian>`_.
- Do not broadcast in arithmetic operations when global option ``arithmetic_broadcast=False``
(:issue:`6806`, :pull:`8784`).
By `Etienne Schalk <https://github.com/etienneschalk>`_ and `Deepak Cherian <https://github.com/dcherian>`_.
- Add the ``.oindex`` property to Explicitly Indexed Arrays for orthogonal indexing functionality. (:issue:`8238`, :pull:`8750`)
By `Anderson Banihirwe <https://github.com/andersy005>`_.

- Add the ``.vindex`` property to Explicitly Indexed Arrays for vectorized indexing functionality. (:issue:`8238`, :pull:`8780`)
By `Anderson Banihirwe <https://github.com/andersy005>`_.

- Expand use of ``.oindex`` and ``.vindex`` properties. (:pull: `8790`)
By `Anderson Banihirwe <https://github.com/andersy005>`_ and `Deepak Cherian <https://github.com/dcherian>`_.
- Allow creating :py:class:`xr.Coordinates` objects with no indexes (:pull:`8711`)
By `Benoit Bovy <https://github.com/benbovy>`_ and `Tom Nicholas
<https://github.com/TomNicholas>`_.
- Enable plotting of ``datetime.dates``. (:issue:`8866`, :pull:`8873`)
By `Sascha Hofmann <https://github.com/saschahofmann>`_.

Breaking changes
~~~~~~~~~~~~~~~~


Deprecations
~~~~~~~~~~~~
- Don't allow overwriting index variables with ``to_zarr`` region writes. (:issue:`8589`, :pull:`8876`).
By `Deepak Cherian <https://github.com/dcherian>`_.


Bug fixes
Expand All @@ -57,16 +90,29 @@ Bug fixes
`CFMaskCoder`/`CFScaleOffsetCoder` (:issue:`2304`, :issue:`5597`,
:issue:`7691`, :pull:`8713`, see also discussion in :pull:`7654`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.

Documentation
~~~~~~~~~~~~~

- Do not cast `_FillValue`/`missing_value` in `CFMaskCoder` if `_Unsigned` is provided
(:issue:`8844`, :pull:`8852`).
- Adapt handling of copy keyword argument for numpy >= 2.0dev
(:issue:`8844`, :pull:`8851`, :pull:`8865`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
- Import trapz/trapezoid depending on numpy version
(:issue:`8844`, :pull:`8865`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
- Warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend
(:issue:`5563`, :pull:`8874`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
- Fix bug incorrectly disallowing creation of a dataset with a multidimensional coordinate variable with the same name as one of its dims.
(:issue:`8884`, :pull:`8886`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.

Internal Changes
~~~~~~~~~~~~~~~~
- Migrates ``treenode`` functionality into ``xarray/core`` (:pull:`8757`)
By `Matt Savoie <https://github.com/flamingbear>`_ and `Tom Nicholas
<https://github.com/TomNicholas>`_.
- Migrates ``datatree`` functionality into ``xarray/core``. (:pull: `8789`)
By `Owen Littlejohns <https://github.com/owenlittlejohns>`_, `Matt Savoie
<https://github.com/flamingbear>`_ and `Tom Nicholas <https://github.com/TomNicholas>`_.


.. _whats-new.2024.02.0:
Expand Down
1 change: 0 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,6 @@ module = [
"xarray.tests.test_dask",
"xarray.tests.test_dataarray",
"xarray.tests.test_duck_array_ops",
"xarray.tests.test_groupby",
"xarray.tests.test_indexing",
"xarray.tests.test_merge",
"xarray.tests.test_missing",
Expand Down
20 changes: 6 additions & 14 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
T_NetcdfTypes = Literal[
"NETCDF4", "NETCDF4_CLASSIC", "NETCDF3_64BIT", "NETCDF3_CLASSIC"
]
from xarray.datatree_.datatree import DataTree
from xarray.core.datatree import DataTree

DATAARRAY_NAME = "__xarray_dataarray_name__"
DATAARRAY_VARIABLE = "__xarray_dataarray_variable__"
Expand Down Expand Up @@ -1562,24 +1562,19 @@ def _auto_detect_regions(ds, region, open_kwargs):
return region


def _validate_and_autodetect_region(
ds, region, mode, open_kwargs
) -> tuple[dict[str, slice], bool]:
def _validate_and_autodetect_region(ds, region, mode, open_kwargs) -> dict[str, slice]:
if region == "auto":
region = {dim: "auto" for dim in ds.dims}

if not isinstance(region, dict):
raise TypeError(f"``region`` must be a dict, got {type(region)}")

if any(v == "auto" for v in region.values()):
region_was_autodetected = True
if mode != "r+":
raise ValueError(
f"``mode`` must be 'r+' when using ``region='auto'``, got {mode}"
)
region = _auto_detect_regions(ds, region, open_kwargs)
else:
region_was_autodetected = False

for k, v in region.items():
if k not in ds.dims:
Expand Down Expand Up @@ -1612,7 +1607,7 @@ def _validate_and_autodetect_region(
f".drop_vars({non_matching_vars!r})"
)

return region, region_was_autodetected
return region


def _validate_datatypes_for_zarr_append(zstore, dataset):
Expand Down Expand Up @@ -1784,12 +1779,9 @@ def to_zarr(
storage_options=storage_options,
zarr_version=zarr_version,
)
region, region_was_autodetected = _validate_and_autodetect_region(
dataset, region, mode, open_kwargs
)
# drop indices to avoid potential race condition with auto region
if region_was_autodetected:
dataset = dataset.drop_vars(dataset.indexes)
region = _validate_and_autodetect_region(dataset, region, mode, open_kwargs)
# can't modify indexed with region writes
dataset = dataset.drop_vars(dataset.indexes)
if append_dim is not None and append_dim in region:
raise ValueError(
f"cannot list the same dimension in both ``append_dim`` and "
Expand Down
4 changes: 2 additions & 2 deletions xarray/backends/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
from netCDF4 import Dataset as ncDataset

from xarray.core.dataset import Dataset
from xarray.core.datatree import DataTree
from xarray.core.types import NestedSequence
from xarray.datatree_.datatree import DataTree

# Create a logger object, but don't add any handlers. Leave that to user code.
logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -137,8 +137,8 @@ def _open_datatree_netcdf(
**kwargs,
) -> DataTree:
from xarray.backends.api import open_dataset
from xarray.core.datatree import DataTree
from xarray.core.treenode import NodePath
from xarray.datatree_.datatree import DataTree

ds = open_dataset(filename_or_obj, **kwargs)
tree_root = DataTree.from_dict({"/": ds})
Expand Down
21 changes: 12 additions & 9 deletions xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
from xarray.core import indexing
from xarray.core.utils import (
FrozenDict,
emit_user_level_warning,
is_remote_uri,
read_magic_number_from_file,
try_read_magic_number_from_file_or_path,
Expand All @@ -39,7 +40,7 @@

from xarray.backends.common import AbstractDataStore
from xarray.core.dataset import Dataset
from xarray.datatree_.datatree import DataTree
from xarray.core.datatree import DataTree


class H5NetCDFArrayWrapper(BaseNetCDF4Array):
Expand All @@ -58,21 +59,23 @@ def _getitem(self, key):
return array[key]


def maybe_decode_bytes(txt):
if isinstance(txt, bytes):
return txt.decode("utf-8")
else:
return txt


def _read_attributes(h5netcdf_var):
# GH451
# to ensure conventions decoding works properly on Python 3, decode all
# bytes attributes to strings
attrs = {}
for k, v in h5netcdf_var.attrs.items():
if k not in ["_FillValue", "missing_value"]:
v = maybe_decode_bytes(v)
if isinstance(v, bytes):
try:
v = v.decode("utf-8")
except UnicodeDecodeError:
emit_user_level_warning(
f"'utf-8' codec can't decode bytes for attribute "
f"{k!r} of h5netcdf object {h5netcdf_var.name!r}, "
f"returning bytes undecoded.",
UnicodeWarning,
)
attrs[k] = v
return attrs

Expand Down
2 changes: 1 addition & 1 deletion xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@

from xarray.backends.common import AbstractDataStore
from xarray.core.dataset import Dataset
from xarray.datatree_.datatree import DataTree
from xarray.core.datatree import DataTree

# This lookup table maps from dtype.byteorder to a readable endian
# string used by netCDF4.
Expand Down
10 changes: 10 additions & 0 deletions xarray/backends/scipy_.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
Frozen,
FrozenDict,
close_on_error,
module_available,
try_read_magic_number_from_file_or_path,
)
from xarray.core.variable import Variable
Expand All @@ -39,6 +40,9 @@
from xarray.core.dataset import Dataset


HAS_NUMPY_2_0 = module_available("numpy", minversion="2.0.0.dev0")


def _decode_string(s):
if isinstance(s, bytes):
return s.decode("utf-8", "replace")
Expand Down Expand Up @@ -76,6 +80,12 @@ def __getitem__(self, key):
# with the netCDF4 library by ensuring we can safely read arrays even
# after closing associated files.
copy = self.datastore.ds.use_mmap

# adapt handling of copy-kwarg to numpy 2.0
# see https://github.com/numpy/numpy/issues/25916
# and https://github.com/numpy/numpy/pull/25922
copy = None if HAS_NUMPY_2_0 and copy is False else copy

return np.array(data, dtype=self.dtype, copy=copy)

def __setitem__(self, key, value):
Expand Down
Loading

0 comments on commit 4399d96

Please sign in to comment.