Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement firstindex, lastindex, axes, size, and ndims consistently #2573

Merged
merged 11 commits into from
Dec 22, 2020

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Dec 6, 2020

This makes sure we have a complete indexing API for all types we expose. I hope I have not missed anything.

@nalimilan - if you are OK with this approach I will add tests and make it ready for a review.

@bkamins bkamins added the non-breaking The proposed change is not breaking label Dec 6, 2020
@bkamins bkamins added this to the 1.0 milestone Dec 6, 2020
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, looks good. Though if Base adds a fallback maybe we won't need all of these?

src/groupeddataframe/groupeddataframe.jl Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Dec 6, 2020

Though if Base adds a fallback maybe we won't need all of these?

We need them, because earlier versions of Julia will not have them so this way we maintain consistency and make sure DataFrames.jl codes work on all versions of Julia post 1.0.


Also the major decision is if we really want all those methods defined. What I mean is that in Julia Base you now have for example:

julia> size((1,2,3))
ERROR: MethodError: no method matching size(::Tuple{Int64,Int64,Int64})

julia> size((a=1,b=2,c=3))
ERROR: MethodError: no method matching size(::NamedTuple{(:a, :b, :c),Tuple{Int64,Int64,Int64}})

julia> axes((1,2,3))
(Base.OneTo(3),)

julia> axes((a=1,b=2,c=3))
ERROR: MethodError: no method matching size(::NamedTuple{(:a, :b, :c),Tuple{Int64,Int64,Int64}})

so the question is if we define them now for objects in DataFrames.jl - can we have some problems with consistency with Julia Base in the future.

@mbauman, @StefanKarpinski - could you please give some guidance when non-array types are allowed to safely define a method for size and axes? In particular the tricky case is types that allow indexing with multiple types of values (like NamedTuple in Julia Base - we have several such types in DataFrames.jl that allow indexing using column number but also column name for instance).

Thank you!

@bkamins bkamins marked this pull request as ready for review December 7, 2020 21:51
@bkamins
Copy link
Member Author

bkamins commented Dec 7, 2020

@nalimilan - this should be good for a review.

I have come to the following conclusion that for array interface (axes etc.) we should return always integer-based objects.

The issue is that Julia Base sometimes is not clear in distinction between using term "index" and "key" for v in the context of collection[v]. I recommend the following (and now consistently use it):

  • "index" is always integer in DataFrames.jl
  • "key" is always a natural key for the collection (so it can be an integer, but can be e.g. Symbol for column names)

Also I am strict with size and axes for invalid dimension unless the type is AbstractVector in which case too large dimension produces 1 and OneTo(1) respectively, as this is what Julia Base does.

test/indexing.jl Outdated Show resolved Hide resolved
test/indexing.jl Outdated Show resolved Hide resolved
test/indexing.jl Outdated Show resolved Hide resolved
test/indexing.jl Outdated Show resolved Hide resolved
test/indexing.jl Outdated Show resolved Hide resolved
test/indexing.jl Outdated Show resolved Hide resolved
test/indexing.jl Outdated Show resolved Hide resolved
test/indexing.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Dec 8, 2020

CI passes

@nalimilan
Copy link
Member

Makes sense. Just a few comments.

docs/src/lib/indexing.md Outdated Show resolved Hide resolved
docs/src/lib/indexing.md Outdated Show resolved Hide resolved
Comment on lines 192 to 194
Base.firstindex(itr::DataFrameColumns, i::Integer) = first(axes(itr, i))
Base.lastindex(itr::DataFrameColumns, i::Integer) = last(axes(itr, i))
Base.axes(itr::DataFrameColumns, i::Integer) = Base.OneTo(size(itr, i))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these in the end given that a fallback was added to Base?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fallback is only in 1.6. I added it to have it also e.g. in 1.0 in case someone in the future uses these functions in user code explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a VERSION check? It's always nice to be able to remove code at some point in the future. :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added - checking if I did it correctly is welcome ;). Thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nalimilan - does the version check look good to you now?

@bkamins
Copy link
Member Author

bkamins commented Dec 13, 2020

Does this PR look OK now?

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, for some reason I had missed the notifications.

docs/src/lib/indexing.md Outdated Show resolved Hide resolved
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins
Copy link
Member Author

bkamins commented Dec 22, 2020

I have added NEWS.md entry.

@bkamins bkamins merged commit a79cdbe into JuliaData:master Dec 22, 2020
@bkamins bkamins deleted the add_firstindex branch December 22, 2020 12:10
@bkamins
Copy link
Member Author

bkamins commented Dec 22, 2020

Thank you!

@bkamins bkamins restored the add_firstindex branch December 22, 2020 12:11
@bkamins bkamins deleted the add_firstindex branch December 22, 2020 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision non-breaking The proposed change is not breaking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants