Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forbid modifying names of DataTree objects with parents #9494

Merged
merged 5 commits into from
Sep 14, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 19 additions & 8 deletions xarray/core/treenode.py
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,14 @@ def same_tree(self, other: Tree) -> bool:
AnyNamedNode = TypeVar("AnyNamedNode", bound="NamedNode")


def _validate_name(name: str | None) -> None:
if name is not None:
if not isinstance(name, str):
raise TypeError("node name must be a string or None")
if "/" in name:
raise ValueError("node names cannot contain forward slashes")


class NamedNode(TreeNode, Generic[Tree]):
"""
A TreeNode which knows its own name.
Expand All @@ -653,8 +661,8 @@ class NamedNode(TreeNode, Generic[Tree]):

def __init__(self, name=None, children=None):
super().__init__(children=children)
self._name = None
self.name = name
_validate_name(name)
self._name = name

@property
def name(self) -> str | None:
Expand All @@ -663,11 +671,13 @@ def name(self) -> str | None:

@name.setter
def name(self, name: str | None) -> None:
if name is not None:
if not isinstance(name, str):
raise TypeError("node name must be a string or None")
if "/" in name:
raise ValueError("node names cannot contain forward slashes")
if self.parent is not None:
raise ValueError(
"cannot set the name of a node with a parent. Consider creating "
shoyer marked this conversation as resolved.
Show resolved Hide resolved
"a detached copy of this node via .copy(), or using .rename() "
shoyer marked this conversation as resolved.
Show resolved Hide resolved
"on the parent node."
)
_validate_name(name)
self._name = name

def __repr__(self, level=0):
Expand All @@ -681,7 +691,8 @@ def __str__(self) -> str:

def _post_attach(self: AnyNamedNode, parent: AnyNamedNode, name: str) -> None:
"""Ensures child has name attribute corresponding to key under which it has been stored."""
self.name = name
_validate_name(name) # is this check redundant?
self._name = name
Comment on lines +695 to +696
Copy link
Member

@TomNicholas TomNicholas Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's redundant, at least in the sense that the original code uses the .name property setter, which now calls _validate_name. Other than that it is necessary though, because you can get down to this method by following

child._set_parent(new_parent=self, child_name=name)

which is in the .children setter.

Copy link
Member

@TomNicholas TomNicholas Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you could instead move this check to be inside the ._check_children staticmethod on TreeNode, which will get called first. I think that would be a clearer place to put the check, and you should then be able to remove the NamedNode._post_attach method entirely!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside I think this brings us closer to seeing that NamedNode is almost entirely unnecessary, if it weren't for the possibility of a root node having a name. Using TreeNode alone nodes would still have names, but those names are stored only as child keys on the parent.

So it makes sense that here we are able to refactor such that every name validation is actually done on TreeNode, with the exceptions of inside NamedNode.__init__ and the .name setter, which are the only places we could possibly check the name of the root node (as it has no parent to perform the validation via).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm was I wrong about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does seems so... honestly I find the various hooks on the base TreeNode pretty hard to reason about

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn. I guess just put the validation check back in, merge, then we can refactor with that regression test to adhere to?

I find the various hooks on the base TreeNode pretty hard to reason about

I normally find them okay - I want to look at this issue more closely. Of all the hooks defined on TreeNode we only actually use like 2 of them, the rest are holdovers from the original design copying the anytree library.


def _copy_node(
self: AnyNamedNode,
Expand Down
20 changes: 19 additions & 1 deletion xarray/tests/test_datatree.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,28 @@ def test_empty(self):
assert dt.children == {}
assert_identical(dt.to_dataset(), xr.Dataset())

def test_unnamed(self):
def test_name(self):
dt = DataTree()
assert dt.name is None

dt = DataTree(name="foo")
assert dt.name == "foo"

dt.name = "bar"
assert dt.name == "bar"

dt = DataTree(children={"foo": DataTree()})
assert dt["/foo"].name == "foo"
with pytest.raises(
ValueError, match="cannot set the name of a node with a parent"
):
dt["/foo"].name = "bar"

detached = dt["/foo"].copy()
assert detached.name == "foo"
detached.name = "bar"
assert detached.name == "bar"

def test_bad_names(self):
with pytest.raises(TypeError):
DataTree(name=5) # type: ignore[arg-type]
Expand Down
Loading