Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index_labels method and IndexLabel type indexing #328

Closed
wants to merge 18 commits into from
Closed

Conversation

Tokazama
Copy link
Member

@Tokazama Tokazama commented Jun 30, 2022

Successor to #260. Implements basic tools for accessing multidimensional keys and fancy indexing.

The goal is to have a common backend for some of the clearly overlapping traits/method that go into AxisArray, DimensionalData, and AxisKeys. This should also help us get closer to helping other packages provide support for multiple fancy array types without explicitly depending on them.

@johnnychen94 , @rafaqz , @mcabbott may be interested.

@Tokazama Tokazama mentioned this pull request Jun 30, 2022
@codecov
Copy link

codecov bot commented Jun 30, 2022

Codecov Report

Merging #328 (0ad5cd6) into master (41bfc6f) will decrease coverage by 0.26%.
The diff coverage is 73.07%.

@@            Coverage Diff             @@
##           master     #328      +/-   ##
==========================================
- Coverage   90.27%   90.00%   -0.27%     
==========================================
  Files           9        9              
  Lines        1337     1361      +24     
==========================================
+ Hits         1207     1225      +18     
- Misses        130      136       +6     
Impacted Files Coverage Δ
src/ArrayInterface.jl 85.96% <ø> (ø)
src/indexing.jl 82.92% <60.00%> (-1.37%) ⬇️
src/axes.jl 91.66% <85.71%> (-0.37%) ⬇️
src/dimensions.jl 94.03% <100.00%> (+0.16%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Tokazama Tokazama marked this pull request as ready for review July 1, 2022 20:16
@Tokazama Tokazama changed the title axes_keys method and Key type indexing axislabels method and Label type indexing Aug 5, 2022
Discourse conversation pointed out there may be some confusion with
plotting libraries that often use axis tick labels. Switch to index
should be more specific what is being labeled.
@Tokazama Tokazama removed the request for review from chriselrod August 27, 2022 01:26
@Tokazama
Copy link
Member Author

Tokazama commented Aug 27, 2022

After a month of dedicated bike-shedding on discourse, I think this is ready for review.

It would be good to have some relevant package maintainers okay the name of the function @rafaqz (DimensionalData.jl), @mcabbott (AxisKeys.jl), @davidavdav (NamedArrays.jl), and someone from AxisArrays (maybe @johnnychen94). I'd also like an okay on the syntax from @ChrisRackauckas because we're using the term "label" here and I want to ensure that this isn't problematic for LabelledArrays.jl.

@ChrisRackauckas
Copy link
Member

I don't care about the name

lib/ArrayInterfaceCore/src/ArrayInterfaceCore.jl Outdated Show resolved Hide resolved
src/axes.jl Outdated
index_labels(x, dim)

Returns a tuple of labels assigned to each axis or a collection of labels corresponding to
axis `dim` of `x`. Default is to simply return `map(keys, axes(x))`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really a good idea to have the default return keys? I'd say better return nothing so that the caller knows that no labels are set (it's easy to call keys if needed). Otherwise there's no way to know whether labels are present or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably have changed that back to just the axis after all the discussion around naming this. My thinking was that it would be odd if we returned something that didn't "label" the indexes at all because we are assuming that is what's returned.

Returning nothing would require that every method that acts on index labels would require and extra step to check if it can actually look up values in it. Not an insurmountable issue, but it would certainly be tedious.

Throwing an error when this isn't defined also isn't an option because things like a vector transpose will inevitably have a dimension without labels.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't return nothing when no labels are set, then we should add a separate function called e.g. has_index_labels, a bit like we did in the Metadata PR with hasmetadata and metadata.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely doable and is probably a good method to have irregardless of what the default we choose.

I'm not against returning something that clearly shows no labels are defined. Perhaps I'm being too strict with the rule that the return has the same indices as their corresponding axis, even for default values. I assume people will at least expect to be able to do things like findfirst(==(label), index_labels(A, dim)), in which case we would probably still want a more informative error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer nothing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worth defining a special type just for that? What would be the advantage over Nothing?

If in doubt, we could say that an error is thrown if has_index_labels returns false, as it allows doing anything we want later without breaking code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worth defining a special type just for that? What would be the advantage over Nothing?

I'm just trying to figure out some generic representation of labels not being defined that can pass between methods. If we do vcat(a, b) but one doesn't have index labels, then we can't do vcat(index_labels(a, 1), index_labels(b, 1)). Even if we define something that doesn't error in this situation (e.g., vcat_index_labels(::AbstractVector, ::Nothing)), how does this produce a new vector of index labels for the newly generated array? I figured UndefLabel could represent every index without a label.

@rafaqz how are you handling this situation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafaqz, it looks like NoLookup is basically the same thing as what I'm describing. It wraps the non-labeled indices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we want to have has_index_labels(T::Type) or has_index_labels(x) = any(are_labels, index_labels)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this earlier. If your UndefLabels is essentially my NoLookup, that should be fine.

Returns a tuple of labels assigned to each axis or a collection of labels corresponding to
axis `dim` of `x`. Default is to simply return `map(keys, axes(x))`.
"""
index_labels(x, dim) = index_labels(x, to_dims(x, dim))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the function live in ArrayInterfaceCore so that existing "named array" packages can overload it? BTW, it would be good to ping their authors to ensure they would all be OK with the API, otherwise it won't make a lot of sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing packages can overload it from ArrayInterface. If you're referring to supporting named dimensions like NamedDims.jl, then they define ArrayInterface.dimnames and to_dims maps to the appropriate dimension so they don't have to overload every method with a dim argument.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayInterface is relatively heavy, which is why ArrayInterfaceCore was created IIUC. I guess it's up to package authors to say whether a dependency on ArrayInterface is acceptable for them or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was more of an issue before we went through all the trouble to fix invalidations due to StaticInt earlier this year. That doesn't mean we can't improve the situation. We are actively trying to move matured functionality into base where appropriate (see #340). I regularly review the code here in an effort to eliminate problematic code that still exists (e.g., redundancies, generated functions, etc,). For example, once we know how this PR is going to look I can finally finish an effort to consolidate a lot of "indexing.jl" by overloading base methods instead of reimplementing a lot of what's in base already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's a bit of a chicken and egg issue. We use StaticSymbol for dimension names known at compile time so that we can use them as a point of reference in an inferrible way. It's pretty difficult to do this only relying on constant propagation (demonstrated that with static sizes here JuliaLang/julia#44538 (comment)).

If someone has a reliable solution I'm open to it. I've been trying to actively address this and related issues for years

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oxinabox is it still a concern depending on ArrayInterface at this point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that ArrayInterface doesn't like Requires.jl a billion packages, I am much more comfortable depending upon it for NamedDims.jl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion was just to put empty function definitions in ArrayInterfaceCore (like we do with DataAPI and StatsAPI). That doesn't prevent keeping fallback method definitions in ArrayInterface as this PR does. But packages that don't want to use fallback definitions are still able to overload the functions by depending only on ArrayInterfaceCore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been trying to avoid situations where the behavior of a method is different if ArrayInterface is loaded vs just ArrayInterfaceCore.

@Tokazama Tokazama changed the title axislabels method and Label type indexing index_labels method and IndexLabel type indexing Sep 2, 2022
@Tokazama
Copy link
Member Author

@mcabbott, would you mind providing some input if you have time?

* removed support for nested traversal of `index_labels`. Support for
  this can be added later but it required making a lot of decisions
  about managing labels that up front that would be better addressed
  through iterative PRs
* removed `has_index_labels`. There are some odd corner cases for this
  one. Particularly for `SubArrays` where the presence of labels in the
  parent don't always have a clear way of propagating forward. Again,
  we can address this one but it will take some decisions about how
  labels are propagated.
* `UnlabelledIndices` and `LabelledIndices` are types that provide a
  more clear structure to what a label is and how they are accessed.
* removed support for nested traversal of `index_labels`. Support for
  this can be added later but it required making a lot of decisions
  about managing labels that up front that would be better addressed
  through iterative PRs
* removed `has_index_labels`. There are some odd corner cases for this
  one. Particularly for `SubArrays` where the presence of labels in the
  parent don't always have a clear way of propagating forward. Again,
  we can address this one but it will take some decisions about how
  labels are propagated.
* `UnlabelledIndices` and `LabelledIndices` are types that provide a
  more clear structure to what a label is and how they are accessed. For
  now access only is supported through `getlabels` and `setlabels`
* removed support for nested traversal of `index_labels`. Support for
  this can be added later but it required making a lot of decisions
  about managing labels that up front that would be better addressed
  through iterative PRs
* removed `has_index_labels`. There are some odd corner cases for this
  one. Particularly for `SubArrays` where the presence of labels in the
  parent don't always have a clear way of propagating forward. Again,
  we can address this one but it will take some decisions about how
  labels are propagated.
* `UnlabelledIndices` and `LabelledIndices` are types that provide a
  more clear structure to what a label is and how they are accessed. For
  now access only is supported through `getlabels` and `setlabels`
@ChrisRackauckas
Copy link
Member

Anything static should go to https://github.com/JuliaArrays/StaticArrayInterface.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants