-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a .drop_attrs
method
#8258
Add a .drop_attrs
method
#8258
Conversation
I think it's a good idea. But should probably have the same arguments and workings as If you want to control the behavior of dropping attrs from variables or only dataset level attrs is another question. |
+1 What are variable-level attrs generally used for? For semantic information ("collected on |
Yes, I was thinking about it a bit more like |
Hi team — what do we think about this? @pydata/xarray |
xarray/core/dataset.py
Outdated
|
||
# Remove attributes from each variable in the dataset | ||
for var in self.variables: | ||
self[var].attrs = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding this should wipe the attrs of the index_vars of the original object?
Lines 1361 to 1365 in e8be4bb
for k, v in self._variables.items(): | |
if k in index_vars: | |
variables[k] = index_vars[k] | |
else: | |
variables[k] = v._copy(deep=deep, data=data.get(k), memo=memo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atm an Index
can't have attrs, at least on main
testing ds.indexes['y'].attrs
fails.
I added an explicit test for the coords. Lmk if you have anything else we could do for the index...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the invariant checks will pass when testing this on a Dataset with a multi-index.
It might be problematic to copy coordinates variables individually when they have a common index and especially when they all wrap exactly the same index object in their ._data
attribute (like in the case of a PandasMultiIndex
). A safer approach would be to handle the indexed variables with something like this:
new_variables = {}
for idx, idx_vars in self.xindexes.group_by_index():
# copy each coordinate variable of an index and drop their attrs
temp_variables = {k: v.copy() for k, v in idx_vars.items()}
for v in temp_variables.values():
v.attrs = {}
# maybe re-wrap the index object in new coordinate variables
new_variables.update(idx.create_variables(temp_variables))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done. I don't think I've done it elegantly, but I'm a bit out of my depth and so won't try and solve perfectly on this pass...
Thanks for the review @crusaderky . I realize it's much easier to review when there are tests; I added those |
My main concern is the meaning of "drop attrs" could be confusing:
Overall I think this is probably a good idea, though. Certainly I have written this sort of thing many times! |
I'm very open-minded here! My main motivation is to put less weight on What's a good way of deciding? We could do what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add these methods to api
I've now added a I've provisionally set it as |
@pydata/xarray I realize this is still hanging! What's the best way of resolving? My guess is that there's consensus that the method would be useful, but maybe not on how "deep" it should go. Should we pick something and merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
When you make deep kwarg only, it's easier to add additional arguments later (see other comments)
Co-authored-by: Michael Niklas <mick.niklas@gmail.com>
Co-authored-by: Michael Niklas <mick.niklas@gmail.com>
for more information, see https://pre-commit.ci
Thanks for the ping @headtr1ck ! Any thoughts from others before we merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's try it!
(i didn't look at the tests)
* main: Enable pandas type checking (pydata#9213) Per-variable specification of boolean parameters in open_dataset (pydata#9218) test push Added a space to the documentation (pydata#9247) Fix typing for test_plot.py (pydata#9234) Allow mypy to run in vscode (pydata#9239) Revert "Test main push" Test main push Revert "Update _typing.py" Update _typing.py Add a `.drop_attrs` method (pydata#8258)
Part of #3891
Do we think this is a good idea? I'll add docs & tests if so...Ready to go, just needs agreement on whether it's good