Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement firstindex, lastindex, axes, size, and ndims consistently #2573

Merged
merged 11 commits into from
Dec 22, 2020
33 changes: 24 additions & 9 deletions docs/src/lib/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,12 +238,27 @@ The elements of a `GroupedDataFrame` are [`SubDataFrame`](@ref)s of its parent.
This table presents return value types of calling `names`, `propertynames` and `keys`
bkamins marked this conversation as resolved.
Show resolved Hide resolved
on types exposed to the user by DataFrames.jl:

| Type | `names` | `propertynames` | `keys` |
|---------------------|------------------|------------------|------------------|
| `AbstractDataFrame` | `Vector{String}` | `Vector{Symbol}` | undefined |
| `DataFrameRow` | `Vector{String}` | `Vector{Symbol}` | `Vector{Symbol}` |
| `DataFrameRows` | `Vector{String}` | `Vector{Symbol}` | vector of `Int` |
| `DataFrameColumns` | `Vector{String}` | `Vector{Symbol}` | `Vector{Symbol}` |
| `GroupedDataFrame` | `Vector{String}` | tuple of fields | `GroupKeys` |
| `GroupKeys` | undefined | tuple of fields | vector of `Int` |
| `GroupKey` | `Vector{String}` | `Vector{Symbol}` | `Vector{Symbol}` |
| Type | `names` | `propertynames` | `keys` | `length` | `ndims` |
|---------------------|------------------|------------------|------------------|-----------|---------|
| `AbstractDataFrame` | `Vector{String}` | `Vector{Symbol}` | undefined | undefined | `2` |
| `DataFrameRow` | `Vector{String}` | `Vector{Symbol}` | `Vector{Symbol}` | `Int` | `1` |
| `DataFrameRows` | `Vector{String}` | `Vector{Symbol}` | vector of `Int` | `Int` | `1` |
| `DataFrameColumns` | `Vector{String}` | `Vector{Symbol}` | `Vector{Symbol}` | `Int` | `1` |
| `GroupedDataFrame` | `Vector{String}` | tuple of fields | `GroupKeys` | `Int` | `1` |
| `GroupKeys` | undefined | tuple of fields | vector of `Int` | `Int` | `1` |
| `GroupKey` | `Vector{String}` | `Vector{Symbol}` | `Vector{Symbol}` | `Int` | `1` |

Additionally the above types `T` (i.e. `AbstractDataFrame`, `DataFrameRow`, `DataFrameRows`,
`DataFrameColumns`, `GroupedDataFrame`, `GroupKeys`, `GroupKey`) the following methods are defined:
* `size(::T)` returning a `Tuple` of `Int`.
* `size(::T, ::Integer)` returning an `Int`.
* `axes(::T)` returning a `Tuple` of `Int` vectors.
* `axes(::T, ::Integer)` returning an `Int` vector.
bkamins marked this conversation as resolved.
Show resolved Hide resolved
* `firstindex(::T)` returning `1` (except `AbstractDataFrame` for which it is undefined).
* `firstindex(::T, ::Integer)` returning `1` for a valid dimension (except `DataFrameRows`
and `GroupKeys` for which `1` is also returned for a dimension higher than a valid one
because they are `AbstractVector`).
* `lastindex(::T)` returning `Int` (except `AbstractDataFrame` for which it is undefined).
* `lastindex(::T, ::Integer)` returning `Int` for a valid dimension (except `DataFrameRows`
and `GroupKeys` for which `1` is also returned for a dimension higher than a valid one
because they are `AbstractVector`).
1 change: 1 addition & 0 deletions src/abstractdataframe/abstractdataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,7 @@ end

Base.isempty(df::AbstractDataFrame) = size(df, 1) == 0 || size(df, 2) == 0

Base.firstindex(df::AbstractDataFrame, i::Integer) = first(axes(df, i))
Base.lastindex(df::AbstractDataFrame, i::Integer) = last(axes(df, i))
Base.axes(df::AbstractDataFrame, i::Integer) = Base.OneTo(size(df, i))

Expand Down
13 changes: 11 additions & 2 deletions src/abstractdataframe/iteration.jl
Original file line number Diff line number Diff line change
Expand Up @@ -176,14 +176,23 @@ Base.IteratorSize(::Type{<:DataFrameColumns}) = Base.HasShape{1}()
Base.size(itr::DataFrameColumns) = (size(parent(itr), 2),)

function Base.size(itr::DataFrameColumns, d::Integer)
d < 1 && throw(ArgumentError("dimension out of range"))
return d == 1 ? size(itr)[1] : 1
d != 1 && throw(ArgumentError("dimension out of range"))
return size(itr)[1]
end

Base.ndims(::DataFrameColumns) = 1
Base.ndims(::Type{<:DataFrameColumns}) = 1

Base.length(itr::DataFrameColumns) = size(itr)[1]
Base.eltype(::Type{<:DataFrameColumns}) = AbstractVector

Base.firstindex(itr::DataFrameColumns) = 1
Base.lastindex(itr::DataFrameColumns) = length(itr)

Base.firstindex(itr::DataFrameColumns, i::Integer) = first(axes(itr, i))
Base.lastindex(itr::DataFrameColumns, i::Integer) = last(axes(itr, i))
Base.axes(itr::DataFrameColumns, i::Integer) = Base.OneTo(size(itr, i))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these in the end given that a fallback was added to Base?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fallback is only in 1.6. I added it to have it also e.g. in 1.0 in case someone in the future uses these functions in user code explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a VERSION check? It's always nice to be able to remove code at some point in the future. :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added - checking if I did it correctly is welcome ;). Thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nalimilan - does the version check look good to you now?


Base.iterate(itr::DataFrameColumns, i::Integer=1) =
i <= length(itr) ? (itr[i], i + 1) : nothing
Base.@propagate_inbounds Base.getindex(itr::DataFrameColumns, idx::ColumnIndex) =
Expand Down
5 changes: 5 additions & 0 deletions src/dataframerow/dataframerow.jl
Original file line number Diff line number Diff line change
Expand Up @@ -376,8 +376,13 @@ Return the number of dimensions of a data frame row, which is always `1`.
Base.ndims(::DataFrameRow) = 1
Base.ndims(::Type{<:DataFrameRow}) = 1

Base.firstindex(r::DataFrameRow) = 1
Base.lastindex(r::DataFrameRow) = length(r)

Base.firstindex(r::DataFrameRow, i::Integer) = first(axes(r, i))
Base.lastindex(r::DataFrameRow, i::Integer) = last(axes(r, i))
Base.axes(r::DataFrameRow, i::Integer) = Base.OneTo(size(r, i))

Base.iterate(r::DataFrameRow) = iterate(r, 1)

function Base.iterate(r::DataFrameRow, st)
Expand Down
28 changes: 27 additions & 1 deletion src/groupeddataframe/groupeddataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,19 @@ function Base.iterate(gd::GroupedDataFrame, i=1)
end
end

Compat.lastindex(gd::GroupedDataFrame) = gd.ngroups
nalimilan marked this conversation as resolved.
Show resolved Hide resolved
Base.size(gd::GroupedDataFrame) = (length(gd),)
Base.size(gd::GroupedDataFrame, i::Integer) = size(gd)[i]

Base.ndims(::GroupedDataFrame) = 1
Base.ndims(::Type{<:GroupedDataFrame}) = 1

Base.firstindex(gd::GroupedDataFrame) = 1
Base.lastindex(gd::GroupedDataFrame) = gd.ngroups

Base.firstindex(gd::GroupedDataFrame, i::Integer) = first(axes(gd, i))
Base.lastindex(gd::GroupedDataFrame, i::Integer) = last(axes(gd, i))
Base.axes(gd::GroupedDataFrame, i::Integer) = Base.OneTo(size(gd, i))

Base.first(gd::GroupedDataFrame) = gd[1]
Base.last(gd::GroupedDataFrame) = gd[end]

Expand Down Expand Up @@ -457,6 +469,20 @@ end

Base.parent(key::GroupKey) = getfield(key, :parent)
Base.length(key::GroupKey) = length(parent(key).cols)

Base.size(key::GroupKey) = (length(key),)
Base.size(key::GroupKey, i::Integer) = size(key)[i]

Base.ndims(::GroupKey) = 1
Base.ndims(::Type{<:GroupKey}) = 1

Base.firstindex(key::GroupKey) = 1
Base.lastindex(key::GroupKey) = length(key)

Base.firstindex(key::GroupKey, i::Integer) = first(axes(key, i))
Base.lastindex(key::GroupKey, i::Integer) = last(axes(key, i))
Base.axes(key::GroupKey, i::Integer) = Base.OneTo(size(key, i))

Base.names(key::GroupKey) = string.(parent(key).cols)
# Private fields are never exposed since they can conflict with column names
Base.propertynames(key::GroupKey, private::Bool=false) = copy(parent(key).cols)
Expand Down
146 changes: 146 additions & 0 deletions test/indexing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1859,4 +1859,150 @@ end
@test_throws ArgumentError df[1] = [2]
end

@testset "array interface tests for all types" begin
df = DataFrame(reshape(1:12, 3, 4), :auto)
@test_throws MethodError length(df)
@test ndims(df) == ndims(typeof(df)) == 2
@test size(df) == (3, 4)
@test size(df, 1) == 3
@test size(df, 2) == 4
@test_throws ArgumentError size(df, 3)
@test_throws ArgumentError size(df, 0)
@test axes(df) == (1:3, 1:4)
@test axes(df, 1) == 1:3
@test axes(df, 2) == 1:4
@test_throws ArgumentError axes(df, 3)
@test_throws ArgumentError axes(df, 0)
@test_throws MethodError firstindex(df)
@test firstindex(df, 1) == 1
@test firstindex(df, 2) == 1
@test_throws ArgumentError firstindex(df, 3)
@test_throws ArgumentError firstindex(df, 0)
@test_throws MethodError lastindex(df)
@test lastindex(df, 1) == 3
@test lastindex(df, 2) == 4
@test_throws ArgumentError lastindex(df, 3)
@test_throws ArgumentError lastindex(df, 0)

dfr = df[1, 1:3]
@test length(dfr) == 3
@test ndims(dfr) == ndims(typeof(dfr)) == 1
@test size(dfr) == (3,)
@test size(dfr, 1) == 3
@test_throws BoundsError size(dfr, 2)
@test_throws BoundsError size(dfr, 0)
@test axes(dfr) == (1:3,)
@test axes(dfr, 1) == 1:3
@test_throws BoundsError axes(dfr, 2)
@test_throws BoundsError axes(dfr, 0)
@test firstindex(dfr) == 1
@test firstindex(dfr, 1) == 1
@test_throws BoundsError firstindex(dfr, 2)
@test_throws BoundsError firstindex(dfr, 0)
@test lastindex(dfr) == 3
@test lastindex(dfr, 1) == 3
@test_throws BoundsError lastindex(dfr, 2)
@test_throws BoundsError lastindex(dfr, 0)

er = eachrow(df)
@test length(er) == 3
@test ndims(er) == ndims(typeof(er)) == 1
@test size(er) == (3,)
@test size(er, 1) == 3
@test size(er, 2) == 1
@test_throws BoundsError size(er, 0)
@test axes(er) == (1:3,)
@test axes(er, 1) == 1:3
@test axes(er, 2) == 1:1
@test_throws BoundsError axes(er, 0)
@test firstindex(er) == 1
@test firstindex(er, 1) == 1
@test firstindex(er, 2) == 1
@test_throws BoundsError firstindex(er, 0)
@test lastindex(er) == 3
@test lastindex(er, 1) == 3
@test lastindex(er, 2) == 1
@test_throws BoundsError lastindex(er, 0)

ec = eachcol(df)
@test length(ec) == 4
@test ndims(ec) == ndims(typeof(ec)) == 1
@test size(ec) == (4,)
@test size(ec, 1) == 4
@test_throws ArgumentError size(ec, 2)
@test_throws ArgumentError size(ec, 0)
@test axes(ec) == (1:4,)
@test axes(ec, 1) == 1:4
@test_throws ArgumentError axes(ec, 2)
@test_throws ArgumentError axes(ec, 0)
@test firstindex(ec) == 1
@test firstindex(ec, 1) == 1
@test_throws ArgumentError firstindex(ec, 2)
@test_throws ArgumentError firstindex(ec, 0)
@test lastindex(ec) == 4
@test lastindex(ec, 1) == 4
@test_throws ArgumentError lastindex(ec, 2)
@test_throws ArgumentError lastindex(ec, 0)

gdf = groupby(df, [:x1, :x2, :x3])
@test length(gdf) == 3
@test ndims(gdf) == ndims(typeof(gdf)) == 1
@test size(gdf) == (3,)
@test size(gdf, 1) == 3
@test_throws BoundsError size(gdf, 2)
@test_throws BoundsError size(gdf, 0)
@test axes(gdf) == (1:3,)
@test axes(gdf, 1) == 1:3
@test_throws BoundsError axes(gdf, 2)
@test_throws BoundsError axes(gdf, 0)
@test firstindex(gdf) == 1
@test firstindex(gdf, 1) == 1
@test_throws BoundsError firstindex(gdf, 2)
@test_throws BoundsError firstindex(gdf, 0)
@test lastindex(gdf) == 3
@test lastindex(gdf, 1) == 3
@test_throws BoundsError lastindex(gdf, 2)
@test_throws BoundsError lastindex(gdf, 0)

kgdf = keys(gdf)
@test length(kgdf) == 3
@test ndims(kgdf) == ndims(typeof(kgdf)) == 1
@test size(kgdf) == (3,)
@test size(kgdf, 1) == 3
@test size(kgdf, 2) == 1
@test_throws BoundsError size(kgdf, 0)
@test axes(kgdf) == (1:3,)
@test axes(kgdf, 1) == 1:3
@test axes(kgdf, 2) == 1:1
@test_throws BoundsError axes(kgdf, 0)
@test firstindex(kgdf) == 1
@test firstindex(kgdf, 1) == 1
@test firstindex(kgdf, 2) == 1
@test_throws BoundsError firstindex(kgdf, 0)
@test lastindex(kgdf) == 3
@test lastindex(kgdf, 1) == 3
@test lastindex(kgdf, 2) == 1
@test_throws BoundsError lastindex(kgdf, 0)

gk = kgdf[1]
@test length(gk) == 3
@test ndims(gk) == ndims(typeof(gk)) == 1
@test size(gk) == (3,)
@test size(gk, 1) == 3
@test_throws BoundsError size(gk, 2)
@test_throws BoundsError size(gk, 0)
@test axes(gk) == (1:3,)
@test axes(gk, 1) == 1:3
@test_throws BoundsError axes(gk, 2)
@test_throws BoundsError axes(gk, 0)
@test firstindex(gk) == 1
@test firstindex(gk, 1) == 1
@test_throws BoundsError firstindex(gk, 2)
@test_throws BoundsError firstindex(gk, 0)
@test lastindex(gk) == 3
@test lastindex(gk, 1) == 3
@test_throws BoundsError lastindex(gk, 2)
@test_throws BoundsError lastindex(gk, 0)
end

end # module
2 changes: 1 addition & 1 deletion test/iteration.jl
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ using Test, DataFrames
@test length(eachcol(df)) == size(df, 2)
@test size(eachcol(df)) == (size(df, 2),)
@test size(eachcol(df), 1) == size(df, 2)
@test size(eachcol(df), 2) == 1
@test_throws ArgumentError size(eachcol(df), 2)
@test_throws ArgumentError size(eachcol(df), 0)
@test eachcol(df)[1] == df[:, 1]
@test eachcol(df)[:A] === df[!, :A]
Expand Down