-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep attrs by default? (keep_attrs) #3891
Comments
Why would you want a |
Yes that's fine if people are happy with |
See #1614 for related discussion. I'm happy to set aside backwards compatibility concerns for now and ponder what the ideal policy would be. The original choices here were not made in a super careful way. My longest-standing concern here is about units. One common use case for The other concern is how to combine |
@shoyer to me this it would make the most sense to do a union of the inputs:
Note how this would be different from how scalar coords are treated; scalar coords are discarded when they arrive from multiple inputs and are mismatched. The reason I don't think it's wise to do the same with attrs is that it could be uncontrollably expensive to compute equality, depending on what people loaded in them. I've personally seen them used as back-references to the whole application framework. Also there's no guarantee that they implement |
I think this is a good question @max-sixty , and I have some opinions based on my experience with xBOUT. Firstly I agree with you that for those users who use xarray as a convenience wrapper or for whom it's useful but not critical it makes more sense to keep attrs by default. "Drop by default because otherwise they might become inconsistent with your data" never really made sense to me, because if you care that much about attrs being consistent with data then you really need well-defined rules for how they are propagated in all cases, which we don't (yet) offer. In all other cases you would rather keep them and have to deal with the edge cases (which is why I wanted #2482 ). As a concrete usage example of wanting to preserve attrs while not being overly-concerned if they sometimes get dropped: in xBOUT, our data requires carting around some After the casual wrapper case, the most important cases are:
At the risk of repeating what's in #1614 , I would like to see some hybrid approach, which gives a simple global default along the lines of what @crusaderky suggests, but also allows a plugin which takes over and rigorously specifies the behaviour for the users who do care. Then we can outsource the work of the complex logic to e.g. the community that actually has to preserve CF conventions, or a separate data provenance package. (Also I made a new |
Great, thanks @TomNicholas , appreciate the thoughtful reply. One thing we could do (NB: I don't think we should do this right now, but building on the points above as ideation) is to defer to the Re next steps on setting the default to be |
I agree that this would be very powerful, and allow users to implement all the things they want (provenance, units handling etc.), but this also seems like a big undertaking. In order to have well-defined handling of attrs through operations like Do you think it would be useful to get input from someone who actually wants this for a complex use case? I think the most hardcore one will be data provenance, because that (a) will need complicated underlying logic, (b) ideally needs to be pretty fault-tolerant, and (c) won't be made redundant by pint or duck-array integration. There was someone on #1614 who was asking about this IIRC.
That would be almost every operation wouldn't it? |
I'm trying to imagine what the approach that delegated the largest fraction of the work to an attrs-handling plugin would be. Would it be to give the attrs plugin the input, and the name of the function/method that was being called, and let the plugin completely decide the output attrs? Or would that be under-specified? |
Right, anything involving an object with |
I think it would probably be OK to start propagating more |
I did not think this through carefully, but I wonder if we should extend |
if I remember correctly, we decided to allow passing a user-provided function to Something to keep in mind is that not all strategies make sense for operations that involve only a single variable, like |
If we allow
could be OK, where all decisions are left up to the external package (here (Though what's stopping us from directly adding |
Moving from #8205 (I had searched for @keewis writes:
I think the distinction of combining objects vs. a function that operates on a single object is important. In the case I was working on, this was just a transformation of a single object, so I don't see much of a downside of separating out the ability to drop attrs into a different function. Would there be any interest in:
|
* Add a `.drop_attrs` method Part of #3891 * Add tests * Add explicit coords test * Use `._replace` for half the method * . * Add a `deep` kwarg (default `True`?) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * api * Update xarray/core/dataarray.py Co-authored-by: Michael Niklas <mick.niklas@gmail.com> * Update xarray/core/dataset.py Co-authored-by: Michael Niklas <mick.niklas@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michael Niklas <mick.niklas@gmail.com>
I've held this view in low confidence for a while and wanted to socialize it to see whether there's something to it: Should we keep attrs in operations by default?
Advantages:
drop_attrs
method when people do want to remove themDisadvantages:
once
filter warning)Here are some existing relevant discussions:
keep_attrs
#3304I think this is an easy situation to get into:
I'm up for leaning towards breaking changes if it makes the library better: I think xarray will grow immensely, and so the narrow immediate pain is worth the broader future positive impact. Clearly if the immediate pain stops xarray growing, then it's not a good tradeoff.
The text was updated successfully, but these errors were encountered: