-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DataTree.coords.__setitem__
by adding DataTreeCoordinates
class
#9451
Fix DataTree.coords.__setitem__
by adding DataTreeCoordinates
class
#9451
Conversation
for more information, see https://pre-commit.ci
xarray/core/coordinates.py
Outdated
# TODO should inherited coordinates be here? It would be very hard to allow updating them... | ||
# But actually maybe the ChainMap approach would make this work okay?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see an easy way to make assignment of coordinates to parent nodes work. I think it would be better just to have dt.coords
constructed from inherited coords, but not allow setting coordinates further up the tree. I believe that's consistent with the behaviour we currently have for dt.__setitem__
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Here are some ideas
xarray/core/coordinates.py
Outdated
# TODO is there a potential bug here? What happens if a dim is only present on data variables? | ||
return self._data.dims |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be fine. Dimensions on DatasetCoordinates
also include dimensions that are only present on data variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? This (on main
) looks wrong to me:
In [1]: import xarray as xr
In [2]: ds = xr.Dataset({'a': ('x', [0, 1])})
In [3]: ds
Out[3]:
<xarray.Dataset> Size: 16B
Dimensions: (x: 2)
Dimensions without coordinates: x
Data variables:
a (x) int64 16B 0 1
In [4]: ds.coords
Out[4]:
Coordinates:
*empty*
In [5]: ds.coords.dims
Out[5]: FrozenMappingWarningOnValuesAccess({'x': 2})
I mean the fact no-one has raised this before means it probably isn't of much consequence, but it does seem incorrect / misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised #9466 to track this
xarray/core/coordinates.py
Outdated
self, coords: dict[Hashable, Variable], indexes: Mapping[Any, Index] | ||
) -> None: | ||
|
||
# TODO I don't know how to update coordinates that live in parent nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is OK for this to be an error. The user can replace those coordinates on the parent nodes.
This comment was marked as outdated.
This comment was marked as outdated.
Co-authored-by: Stephan Hoyer <shoyer@google.com>
Co-authored-by: Stephan Hoyer <shoyer@google.com>
xarray/core/coordinates.py
Outdated
def _drop_indexed_coords(self, coords_to_drop: set[Hashable]) -> None: | ||
assert self._data.xindexes is not None | ||
new_coords = drop_indexed_coords(coords_to_drop, self) | ||
for name in self._data._node_coord_variables - new_coords._names: | ||
del self._data._node_coord_variables[name] | ||
self._data._node_indexes = dict(new_coords.xindexes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that this ._drop_indexed_coords
method, and its equivalent in DatasetCoordinates
, is never being called... I can't find any references to it, and if I raise
in both methods then the whole test suite still passes locally.
@dcherian @benbovy you both touched this method last, do you know why it still exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 9dc845a
…las/xarray into datatree_coords_setitem
xarray/core/coordinates.py
Outdated
# TODO should inherited coordinates be here? It would be very hard to allow updating them... | ||
# But actually maybe the ChainMap approach would make this work okay?? | ||
|
||
_data: DataTree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inheriting from Coordinates
and using DataTree
instead of DataWithCoords
here both expose a few legitimate typing issues. Particularly:
DataTree
doesn't directly inherit fromDataWithCoords
, though maybe it could inherit from a future version of it (see Refactor Dataset internals to store data variables and coordinate variables as separate dicts #9203 (comment))Hashable
vsstr
annoyances (DataTree should support Hashable names. #8836)
Mostly we have got away with these discrepancies so far but inheritance here is probably going to mean I need to either ignore
or cast
all over the place?
cc @headtr1ck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this wasn't very difficult to work around, it mostly just required adding a type: ignore[assignment]
here :)
I think this is basically ready now @shoyer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great, thanks Tom!
Co-authored-by: Stephan Hoyer <shoyer@google.com>
Co-authored-by: Stephan Hoyer <shoyer@google.com>
Co-authored-by: Stephan Hoyer <shoyer@google.com>
* main: (26 commits) Forbid modifying names of DataTree objects with parents (pydata#9494) DAS-2155 - Merge datatree documentation into main docs. (pydata#9033) Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378) Ensure TreeNode doesn't copy in-place (pydata#9482) `open_groups` for zarr backends (pydata#9469) Update pyproject.toml (pydata#9484) New whatsnew section (pydata#9483) Release notes for v2024.09.0 (pydata#9480) Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451) Rename DataTree's "ds" and "data" to "dataset" (pydata#9476) Update DataTree repr to indicate inheritance (pydata#9470) Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460) Repo checker (pydata#9450) Add days_in_year and decimal_year to dt accessor (pydata#9105) remove parent argument from DataTree.__init__ (pydata#9465) Fix inheritance in DataTree.copy() (pydata#9457) Implement `DataTree.__delitem__` (pydata#9453) Add ASV for datatree.from_dict (pydata#9459) Make the first argument in DataTree.from_dict positional only (pydata#9446) Fix typos across the code, doc and comments (pydata#9443) ...
* main: (29 commits) Release notes for v2024.09.0 (pydata#9480) Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451) Rename DataTree's "ds" and "data" to "dataset" (pydata#9476) Update DataTree repr to indicate inheritance (pydata#9470) Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460) Repo checker (pydata#9450) Add days_in_year and decimal_year to dt accessor (pydata#9105) remove parent argument from DataTree.__init__ (pydata#9465) Fix inheritance in DataTree.copy() (pydata#9457) Implement `DataTree.__delitem__` (pydata#9453) Add ASV for datatree.from_dict (pydata#9459) Make the first argument in DataTree.from_dict positional only (pydata#9446) Fix typos across the code, doc and comments (pydata#9443) DataTree should not be "Generic" (pydata#9445) Disallow passing a DataArray as data into the DataTree constructor (pydata#9444) Support additional dtypes in `resample` (pydata#9413) Shallow copy parent and children in DataTree constructor (pydata#9297) Bump minimum versions for dependencies (pydata#9434) Always include at least one category in random test data (pydata#9436) Avoid deep-copy when constructing groupby codes (pydata#9429) ...
…ss (pydata#9451) * add a DataTreeCoordinates class * passing read-only properties tests * tests for modifying in-place * WIP making the modification test pass * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * get to the delete tests * test * improve error message * implement delitem * test KeyError * subclass Coordinates instead of DatasetCoordinates * use Frozen(self._data._coord_variables) * Simplify when to raise KeyError Co-authored-by: Stephan Hoyer <shoyer@google.com> * correct bug in suggestion * Update xarray/core/coordinates.py Co-authored-by: Stephan Hoyer <shoyer@google.com> * simplify _update_coords by creating new node data first * update indexes correctly * passes test * update ._drop_indexed_coords * some mypy fixes * remove the apparently-unused _drop_indexed_coords method * fix import error * test that Dataset and DataArray constructors can handle being passed a DataTreeCoordinates object * test dt.coords can be passed to DataTree constructor * improve readability of inline comment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * initial tests with inherited coords * ignore typeerror indicating dodgy inheritance * try to avoid Unbound type error * cast return value correctly * cehck that .coords works with inherited coords * fix data->dataset * fix return type of __getitem__ * Use .dataset instead of .to_dataset() Co-authored-by: Stephan Hoyer <shoyer@google.com> * _check_alignment -> check_alignment * remove dict comprehension Co-authored-by: Stephan Hoyer <shoyer@google.com> * KeyError message formatting Co-authored-by: Stephan Hoyer <shoyer@google.com> * keep generic types for .dims and .sizes * test verifying you cant delete inherited coord * fix mypy complaint * type hint as accepting objects * update note about .dims returning all dims --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Stephan Hoyer <shoyer@google.com>
Doesn't work yet - pushing to get feedback on the approach.
whats-new.rst
api.rst
cc @shoyer