Fix bug when querying unnamed dataarray #5493

TomNicholas · 2021-06-18T17:51:01Z

There might be a slightly neater way to do this, but this works.

Closes query not possible on unnamed DataArray #5492
Tests added
Passes pre-commit run --all-files
User visible changes (including notable bug fixes) are documented in whats-new.rst

keewis · 2021-06-18T17:55:12Z

xarray/core/dataarray.py

+        name = _THIS_ARRAY if self.name is None else self.name
+        ds = self._to_dataset_whole(name=name, shallow_copy=True)


is there a reason why we don't just use _to_temp_dataset / _from_temp_dataset regardless of the name?

_to_dataset_whole is just the same thing as _to_temp_dataset except it accepts the shallow_copy argument, which is was already being used here.

Using _to_dataset_whole with name=None will fail (so I can't just pass self.name because that will be None for an unnamed dataarray). Using _to_dataset_whole without passing a name fails in the same way.

Then trying to use _from_temp_dataset fails with

def _from_temp_dataset( self, dataset: Dataset, name: Union[Hashable, None, Default] = _default ) -> "DataArray": > variable = dataset._variables.pop(_THIS_ARRAY) E KeyError: <this-array>

so I think it will fail for named dataarrays?

I don't think we need the shallow_copy=True as this is a temporary dataset (and it's the usual pattern for delegating to the Dataset implementation). No need to use the name parameter for _from_temp_dataset, either, as it's being restored from self.

Edit: actually, the failure for _from_temp_dataset fails because of the name parameter to _to_dataset_whole. In total, I'd suggest to use the same pattern as in sel

I'd suggest to use the same pattern as in sel

That doesn't work for query because query needs the name of the dataarray to be a variable on the temporary dataset, otherwise it can't evaluate queries involving the dataarray name (such as x="a > 5" where da.name = 'a'). The tests will fail with this error if you try that:

> raise UndefinedVariableError(key, is_local) from err E pandas.core.computation.ops.UndefinedVariableError: name 'a' is not defined

Basically in order for query to work you have to provide a name to construct the temporary dataset with, which rules out using _to_temp_dataset.

indeed. However, if I had to choose between "self" for unnamed and the name for named DataArray objects or always "self" I'd choose the latter because it's easier to use (and the implementation would be easier)? Not sure if we should try to support both (i.e. allow referencing by name and using "self")?

I see what you mean. What I just wrote supports both individually: You use
the name when it's named, and self when it isn't. But if it's named you
can't use self. I suppose you could support using self even when it's named
by examining the query and replacing parts of it before giving it to
ds.query?

So I added support for using 'self' even when the dataarray has a name, by using .replace(). It works, but makes the implementation more complicated again.

One way to slightly simplify this would be to always rename all dataarray names to 'self' before querying.

This will break if there's a non-dimensional coordinate variable named self won't it? I'm not sure there's good solution though, perhaps ... :D

TomNicholas · 2021-07-08T16:58:47Z

xarray/core/dataarray.py

        """

-        ds = self._to_dataset_whole(shallow_copy=True)
-        ds = ds.query(
+        if self.name is None:


This line causes a mypy error

xarray/core/dataarray.py:4510: error: Incompatible types in assignment (expression has type "Hashable", variable has type "str") [assignment] Found 1 error in 1 file (checked 142 source files)

TomNicholas added 3 commits June 18, 2021 13:40

test querying unnamed da

9fcfc3e

use _from_temp_dataset if no name

a469bc7

what's new

ec59926

TomNicholas mentioned this pull request Jun 18, 2021

query not possible on unnamed DataArray #5492

Open

keewis reviewed Jun 18, 2021

View reviewed changes

TomNicholas added 4 commits June 18, 2021 14:57

removed two unneccessary intermediate variables

b89acff

removed shallow copy

851c405

reference values in unnamed dataarrays as 'self'

68a505c

replace self with name for named dataarrays

3603ccd

max-sixty mentioned this pull request Jun 21, 2021

apply to dataset #4863

Open

5 tasks

TomNicholas commented Jul 8, 2021

View reviewed changes

TomNicholas mentioned this pull request Jul 8, 2021

Release v0.19? #5588

Closed

8 tasks

headtr1ck added bug needs work labels Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug when querying unnamed dataarray #5493

Fix bug when querying unnamed dataarray #5493

TomNicholas commented Jun 18, 2021 •

edited

Loading

keewis Jun 18, 2021

TomNicholas Jun 18, 2021

keewis Jun 18, 2021 •

edited

Loading

TomNicholas Jun 18, 2021

TomNicholas Jun 18, 2021

keewis Jun 18, 2021

TomNicholas Jun 21, 2021

TomNicholas Jun 21, 2021

TomNicholas Jun 21, 2021 •

edited

Loading

dcherian Jun 21, 2021

TomNicholas Jul 8, 2021

		name = _THIS_ARRAY if self.name is None else self.name
		ds = self._to_dataset_whole(name=name, shallow_copy=True)

Fix bug when querying unnamed dataarray #5493

Are you sure you want to change the base?

Fix bug when querying unnamed dataarray #5493

Conversation

TomNicholas commented Jun 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keewis Jun 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomNicholas Jun 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomNicholas commented Jun 18, 2021 •

edited

Loading

keewis Jun 18, 2021 •

edited

Loading

TomNicholas Jun 21, 2021 •

edited

Loading