Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display the indexes in the string reprs #6795

Merged
merged 9 commits into from
Oct 12, 2022
Merged

Conversation

keewis
Copy link
Collaborator

@keewis keewis commented Jul 16, 2022

With the flexible indexes refactor indexes have become much more important, which means we should include them in the reprs of DataArray and Dataset objects.

This is a initial attempt, covering only the string reprs, with a few unanswered questions:

  • how do we format indexes? Do we delegate to their __repr__ or some other method?
  • should we skip PandasIndex and PandasMultiIndex?
  • how do we present indexes that wrap multiple columns? At the moment, they are duplicated (see also the discussion in Pass indexes to the Dataset and DataArray constructors #6392)
  • what do we do with the index marker in the coords repr?

(also, how do we best test this?)

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

@benbovy
Copy link
Member

benbovy commented Jul 17, 2022

A few thoughts:

how do we format indexes? Do we delegate to their repr or some other method?

Like for variable data, Xarray indexes could implement _repr_inline_ and __repr__ to have both a summarized and detailed representation. They could also have a _repr_html_ (for fancy representation of complex indexes).

should we skip PandasIndex and PandasMultiIndex?

I'd skip it for the plain text DataArray / Dataset reprs (not a good information / verbosity ratio), but I'd keep it for the html repr (the index section could be collapsed by default) as well as for the Indexes repr. We could also provide a display option for more control on this (e.g., a display_default_indexes option set to False by default).

how do we present indexes that wrap multiple columns? At the moment, they are duplicated

Assuming that all coordinates related to a given index are shown next to each other, we could render the inline repr for the 1st coordinate and then use a short symbol (e.g., --, or a unicode symbol?) below that means "it's the same index".

what do we do with the index marker in the coords repr?

I think we can keep it as-is. It helps to identify at a glance which coordinates are indexed and which aren't. And it's still relevant if we skip PandasIndex and PandasMultiIndex in the plain text DataArray / Dataset reprs.

@dcherian
Copy link
Contributor

Shall we point this at the main branch instead?

@keewis keewis changed the base branch from scipy22 to main July 27, 2022 17:13
xarray/core/dataarray.py Outdated Show resolved Hide resolved
@benbovy
Copy link
Member

benbovy commented Aug 2, 2022

From #6867 (#6867 (comment)), we might want to update the "Dimensions without coordinates" line too.

@TomNicholas
Copy link
Member

should we skip PandasIndex and PandasMultiIndex?

If we're encouraging people to look at these / subclass them to create custom indexes, then they should have a repr too.

@dcherian
Copy link
Contributor

dcherian commented Oct 3, 2022

I think we should move forward here.

Are there any real blockers?

@keewis
Copy link
Collaborator Author

keewis commented Oct 4, 2022

not really, I wanted to wait until set_xindex was in main (the PR has been merged last week) and have not looked at it since.

Edit: we don't yet have tests, though

There's two issues left that might need a bit of discussion: in 8f21df3 I skipped displaying default PandasIndex instances because that's basically redundant with the * on coordinates (and it would have required me to update a lot of doctests, which will probably make this a breaking change). If we decide to revert that, should we mark every coordinate with an index with a *?

Another proposal we had was to replace the "dimensions without coordinates" line with a "coordinates without index" line. @benbovy, this might be a misunderstanding on my part, but I thought "dimension coordinates" (and in particular their indexes) are still used for alignment? If so, I think we might need both lines.

@dcherian dcherian added the plan to merge Final call for comments label Oct 12, 2022
@benbovy
Copy link
Member

benbovy commented Oct 12, 2022

Looks good to me @keewis. Thanks for your work on the indexes repr!

Yes I think we can skip displaying default indexes for now... The question is which indexes are considered as default, i.e., all PandasIndex and PandasMultiIndex instances (like in this PR) or just the single pandas indexes automatically created for the dimension coordinates? We can decide this later, though, it's not a problem adding more indexes in the text repr later (we'll probably need it when dropping the multi-index dimension coordinate with tuple elements). For the html repr it's easier: we could display all indexes and collapse the section by default.

but I thought "dimension coordinates" (and in particular their indexes) are still used for alignment?

Yes that's a good point. Let's keep "dimensions without coordinates".

@dcherian dcherian merged commit b80d973 into pydata:main Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments topic-indexing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants