Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DataArray.rename + docu #6665

Merged
merged 10 commits into from
Jul 18, 2022
Merged

Update DataArray.rename + docu #6665

merged 10 commits into from
Jul 18, 2022

Conversation

headtr1ck
Copy link
Collaborator

On the way, I have added the support for changing the name and dims/coords in the same rename call.

Also took the freedom to fix some unrelated typing problems.

@headtr1ck
Copy link
Collaborator Author

Mow you can also have the name of the DataArray in the rename dict.
The code got a bit more complicated, lots of if-branches are slight modifications of each other, probably simplifyable.

@headtr1ck
Copy link
Collaborator Author

Now that I think about it...
Is it common to have the DataArray being called the same as one of its dimensions/coordinates?
This might break some things.
So maybe better to revert the last commit that introduced this change?

@headtr1ck headtr1ck changed the title Update DataArray.rename Update DataArray.rename + docu Jun 7, 2022
@headtr1ck
Copy link
Collaborator Author

Now the behavior is the same as before with the only addition that one can rename the DataArray with a positional argument and its cords/dims via kwargs at the same time.

I have added further tests that are a bit more expressive than before.

Should be ready to merge now.

@max-sixty
Copy link
Collaborator

This seems like a nice improvement given the existing state.

I've found the effort to specialize methods (e.g. drop_var vs. drop_sel) has been good, and we might want to extend this principle. With the proposed code:

  • renaming a coord on a DataArray is rename
  • renaming a coord on a Dataset is rename_vars

...which is not ideal, though might be the best tradeoff.

WDYT @headtr1ck ? Or any other thoughts from others?

@keewis
Copy link
Collaborator

keewis commented Jun 8, 2022

in pint-xarray we have the .pint.to method which uses a very similar concept, maybe it is worth copying that for rename?

@max-sixty
Copy link
Collaborator

max-sixty commented Jun 9, 2022

in pint-xarray we have the .pint.to method which uses a very similar concept, maybe it is worth copying that for rename?

We do want it to be fluent though, so it can be directly on the DataArray, otherwise you can't do:

(
  da
  .rename(...)
  .sum(...)
)

@headtr1ck
Copy link
Collaborator Author

Honestly I am a big fan of functions that are called the same for DataArray and Dataset since this enables workflows on objects that could be either.

A rename_dims method sounds easy to implement.
The rename_vars does not sound extremely convincing for DataArray though. Should this rename the array and it's coords?

@keewis
Copy link
Collaborator

keewis commented Jun 9, 2022

We do want it to be fluent though, so it can be directly on the DataArray, otherwise you can't do:

I was referring to the concept rather than the method. The idea is that you can refer to the DataArray by positional or by name, and the method takes either a single string or a dict mapping old names to new names, also with a kwargs version (the merging happens with either_dict_or_kwargs). That would mean these all do the same thing:

a = xr.DataArray([0, 1, 2], dims="x")
a.rename("new")
a.rename({None: "new"})  # None is the current name

b = xr.DataArray([0, 1, 2], dims="x", name="b")
b.rename("new")
b.rename(b="new")  # b is the current name

# with coords
c = xr.DataArray([0, 1, 2], dims="x", coords={"x": ["a", "b", "c"]}, name="c")
c.rename("new", x="y")
c.rename(c="new", x="y")
c.rename({"c": "new", "x": "y"})  # providing both the `dict` and the kwargs raises

@headtr1ck
Copy link
Collaborator Author

I will implement a rename_dims method, this should be trivial.

I am not sure about rename_vars: what should this method do with the name of the DataArray? Should it only rename coords or also the name?

@max-sixty
Copy link
Collaborator

@keewis v interesting suggestion — the object name as positional and vars as kwargs.

As I think was mentioned above, one choice is whether a kwarg with the name of the object renames the object. I would worry it's slightly ambiguous. And potentially not additive. WDYT?

a rename_dims method

This could be good. Would this also rename the coord associated with the dim?

@keewis
Copy link
Collaborator

keewis commented Jun 11, 2022

Would this also rename the coord associated with the dim?

To decide that we probably need to figure out what the role of a dimension will be after the index refactor. Before the refactor, we needed to rename the dimension coordinate along with the dimension to keep the index, but now that indexes are independent of dimensions it might be fine not to do that.

@headtr1ck
Copy link
Collaborator Author

Ok I see. I hereby withdraw my comment that it will be trivial, haha.

@headtr1ck
Copy link
Collaborator Author

So, how about we leave the current implemention like it is (or merge this PR, it does not really change the core functionality).

And then open another issue on how to procede with rename in light of the index refactor?

@max-sixty
Copy link
Collaborator

Yes great, let's merge this.

And then open another issue on how to procede with rename in light of the index refactor?
Great!


Then two main options I see for synthesizing the vars issues:

  • rename operates on DataArray as described in https://github.com/pydata/xarray/pull/6665#issuecomment-1150810485. Generally I'm less keen on "different types have different semantics", and here a positional arg would mean a DataArray rename, and kwarg would mean var rename. But it does work locally to DataArray quite well.
  • rename only exists on DataArrays for the name of the DataArray, and we use rename_vars & rename_dims for both DataArrays & Datasets. So Dataset.rename is soft-deprecated.

@headtr1ck headtr1ck mentioned this pull request Jun 18, 2022
4 tasks
@headtr1ck
Copy link
Collaborator Author

I have created #6704 and tried to copy a summary of this discussion.
Feel free to continue the discussion there :)

@Illviljan Illviljan added the plan to merge Final call for comments label Jul 16, 2022
@Illviljan Illviljan merged commit 392a614 into pydata:main Jul 18, 2022
@Illviljan
Copy link
Contributor

Thanks @headtr1ck !

@headtr1ck headtr1ck deleted the renamedocu branch July 18, 2022 15:31
dcherian added a commit to keewis/xarray that referenced this pull request Jul 22, 2022
* main: (313 commits)
  Update whats-new
  Release notes for v2022.06.0 (pydata#6815)
  Drop multi-indexes when assigning to a multi-indexed variable (pydata#6798)
  Support NumPy array API (experimental) (pydata#6804)
  Add cumsum to DatasetGroupBy (pydata#6525)
  Refactor groupby binary ops code. (pydata#6789)
  Update DataArray.rename + docu (pydata#6665)
  Switch to T_DataArray and T_Dataset in concat (pydata#6784)
  Fix typos found by codespell (pydata#6794)
  Update groupby attrs tests (pydata#6787)
  Update map_blocks to use chunksizes property. (pydata#6776)
  Fix `DataArrayRolling.__iter__` with `center=True` (pydata#6744)
  [test-upstream] Update flox repo URL (pydata#6780)
  Move _infer_meta_data and _parse_size to utils (pydata#6779)
  Make the `sel` error more descriptive when `method` is unset (pydata#6774)
  Move Rolling tests to their own testing module (pydata#6777)
  [pre-commit.ci] pre-commit autoupdate (pydata#6773)
  move da and ds fixtures to conftest.py (pydata#6730)
  Bump EnricoMi/publish-unit-test-result-action from 1 to 2 (pydata#6770)
  Type shape methods (pydata#6767)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataArray.rename docu missing renaming of dimensions
4 participants