Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeNode constructor should not modify children in-place #9478

Closed
TomNicholas opened this issue Sep 11, 2024 · 0 comments · Fixed by #9482
Closed

TreeNode constructor should not modify children in-place #9478

TomNicholas opened this issue Sep 11, 2024 · 0 comments · Fixed by #9482
Labels
topic-DataTree Related to the implementation of a DataTree class topic-internals

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Sep 11, 2024

What is your issue?

The in-place modification issue described in #9196 was fixed for DataTree, but actually still exists for its private base class TreeNode.

In [1]: from xarray.core.treenode import TreeNode
   ...: 
   ...: child = TreeNode()
   ...: root = TreeNode(children={'child': child})
   ...: print(child.parent)
TreeNode(children={'child': TreeNode(children={})})

(The issue here being that the constructor has returned a new object root but also modified the object child in-place.)

We do test TreeNode directly (in test_treenode.py), which I think is useful as it internally separates tree structure operations from data manipulation operations. But if you copy the new test added in #9196 to treenode.py it will currently fail.

Trying to fix this reveals a rabbit hole of ways in which the DataTree/TreeNode relationship should be refactored:

  1. DataTree.__init__ should call TreeNode.__init__ to assign children,
  2. TreeNode.__init__ should .copy() children (i.e. move the solution to Creating a DataTree should not modify parent and children in-place #9196 down into TreeNode),
  3. This requires .copy() to be defined on TreeNode rather than on DataTree, with only ._copy_node overridden to also copy .data,
  4. That requires ._copy_subtree() to use only methods available to the TreeNode class, to iterate over the subtree efficiently,
  5. That might require using some methods that are currently only defined on the NamedNode class, particularly .relative_to (not totally sure about that yet),
  6. .relative_to requires knowing the node .name, which implies perhaps we should merge TreeNode and NamedNode (which was suggested previously by @shoyer anyway).
@TomNicholas TomNicholas added topic-internals topic-DataTree Related to the implementation of a DataTree class labels Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class topic-internals
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant