Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow predicate in Cols #2881

Merged
merged 10 commits into from
Sep 20, 2021
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@
* the `DataFrame` constructor when matrix is passed to it as a first
argument now allows `copycols` keyword argument
([#2859](https://github.com/JuliaData/DataFrames.jl/pull/2859))
* `Cols` now accepts a predicate accepting column names as strings.
([#2880](https://github.com/JuliaData/DataFrames.jl/pull/2880))
bkamins marked this conversation as resolved.
Show resolved Hide resolved

## Bug fixes

Expand Down
12 changes: 11 additions & 1 deletion docs/src/lib/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,17 @@ The rules for a valid type of index into a column are the following:
* a vector of `Bool` that has to be a subtype of `AbstractVector{Bool}`;
* a regular expression, which gets expanded to a vector of matching column names;
* a `Not` expression (see [InvertedIndices.jl](https://github.com/mbauman/InvertedIndices.jl));
* an `Cols`, `All` or `Between` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
the `Not(idx)` selects all indices not in the passed `idx`;
* a `Cols` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
`Cols(idxs...)` selects the union of the selections in `idxs`; in particular
`Cols()` selects no columns and `Cols(:)` selects all columns; a special rule is
`Cols(predicate)`, where `predicate` is a predicate function; in this case
the columns whose names passed to `predicate` as strings return `true`
are selected.
* a `Between` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
`Between(first, last)` selects the columns between `first` and `last`;
* an `All` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
`All()` selects all columns, equivalent to `:`;
* a colon literal `:`.

The rules for a valid type of index into a row are the following:
Expand Down
1 change: 1 addition & 0 deletions docs/src/lib/internals.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ getmaxwidths
ourshow
ourstrwidth
@spawn_for_chunks
Cols
bkamins marked this conversation as resolved.
Show resolved Hide resolved
```
39 changes: 35 additions & 4 deletions docs/src/man/working_with_dataframes.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,18 +255,49 @@ julia> df[!, Not(:x1)]
Finally, you can use `Not`, `Between`, `Cols` and `All` selectors in more
complex column selection scenarios (note that `Cols()` selects no columns while
`All()` selects all columns therefore `Cols` is a preferred selector if you
write generic code). The following examples move all columns whose names match
`r"x"` regular expression respectively to the front and to the end of a data
frame:
write generic code). Here are examples of using each of these selectors:

```
```jldoctest dataframe
julia> df = DataFrame(r=1, x1=2, x2=3, y=4)
1×4 DataFrame
Row │ r x1 x2 y
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 2 3 4

julia> df[:, Not(:r)] # drop :r column
1×3 DataFrame
Row │ x1 x2 y
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 2 3 4

julia> df[:, Between(:r, :x2)] # keep columns between :r and :x2
1×3 DataFrame
Row │ r x1 x2
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 2 3

julia> df[:, All()] # keep all columns
1×4 DataFrame
Row │ r x1 x2 y
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 2 3 4

julia> df[:, Cols(x -> startswith(x, "x"))] # keep columns whose name starts with "x"
1×2 DataFrame
Row │ x1 x2
│ Int64 Int64
─────┼──────────────
1 │ 2 3
```

The following examples show a more complex use of the `Cols` selector, which moves all
columns whose names match `r"x"` regular expression respectively to the front
and to the end of the data frame:
```jldoctest dataframe
julia> df[:, Cols(r"x", :)]
1×4 DataFrame
Row │ x1 x2 r y
Expand Down
11 changes: 11 additions & 0 deletions src/other/index.jl
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,17 @@ end
isempty(idx.cols) ? (1:length(x)) : throw(ArgumentError("All(args...) is not supported: use Cols(args...) instead"))
@inline Base.getindex(x::AbstractIndex, idx::Cols) =
isempty(idx.cols) ? Int[] : union(getindex.(Ref(x), idx.cols)...)
@inline Base.getindex(x::AbstractIndex, idx::Cols{Tuple{typeof(:)}}) = x[:]
@inline Base.getindex(x::AbstractIndex, idx::Cols{<:Tuple{Function}}) =
findall(idx.cols[1], names(x))

"""
Cols(f::Function)

Select the columns whose names passed to the `f` predicate as strings return `true`.
As a special case if `:` is passed (the `Colon` function) select all columns.
"""
Cols
bkamins marked this conversation as resolved.
Show resolved Hide resolved
bkamins marked this conversation as resolved.
Show resolved Hide resolved

@inline function Base.getindex(x::AbstractIndex, idx::AbstractVector{<:Integer})
if any(v -> v isa Bool, idx)
Expand Down
5 changes: 5 additions & 0 deletions test/index.jl
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,11 @@ end
df = DataFrame(a1=1, a2=2, b1=3, b2=4)
@test df[:, Cols(r"a", Not(r"1"))] == df[:, [1, 2, 4]]
@test df[:, Cols(Not(r"1"), r"a")] == df[:, [2, 4, 1]]
@test df[:, Cols(x -> x[1] == 'a')] == df[:, [1, 2]]
@test df[:, Cols(x -> x[end] == '1')] == df[:, [1, 3]]
@test df[:, Cols(x -> x[end] == '3')] == DataFrame()
@test_throws MethodError df[:, Cols(x -> true, 1)] == DataFrame()
@test_throws MethodError df[:, Cols(1, x -> true)] == DataFrame()
end

@testset "views" begin
Expand Down