Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add findcols #3389

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@
columns only to a subset of the columns specified by the `cols`
keyword argument
([#3386](https://github.com/JuliaData/DataFrames.jl/pull/3386))
* add `findcols` that returns a vector of integer column indices
of a data frame that meet the passed condition function
([#3389](https://github.com/JuliaData/DataFrames.jl/pull/3389))

## Bug fixes

Expand Down
7 changes: 6 additions & 1 deletion docs/src/lib/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ This is a list of operations that currently make use of multi-threading:
* a transformation produces one row per group and the passed transformation
is a custom function (i.e. not for standard reductions, which use
optimized single-threaded methods).
- `dropmissing` when the provided data frame has more than 1 column and `view=false`
- `dropmissing` when the provided data frame has more than 1 column and `view=false`
(subsetting of individual columns is spawned in separate tasks).

In general at least Julia 1.4 is required to ensure that multi-threading is used
Expand Down Expand Up @@ -170,6 +170,11 @@ unique
unique!
```

## Filtering columns
```@docs
findcols
```

## Working with missing values
```@docs
allowmissing
Expand Down
1 change: 1 addition & 0 deletions src/DataFrames.jl
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ export AbstractDataFrame,
dropmissing!,
dropmissing,
fillcombinations,
findcols,
flatten,
groupby,
groupindices,
Expand Down
26 changes: 26 additions & 0 deletions src/abstractdataframe/abstractdataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3252,3 +3252,29 @@ function Base.iterate(itr::Iterators.PartitionIterator{<:AbstractDataFrame}, sta
r = min(state + itr.n - 1, last_idx)
return view(itr.c, state:r, :), r + 1
end

"""
findall(f, df::AbstractDataFrame)

Return an integer vector `I` of the column indices `i` of `df` where `f(df[:, i])` returns `true`.
If there are no such columns of `df`, return `Int[]`.

# Examples

```jldoctest
julia> df = DataFrame(a=[1, missing], b=[2, 3], c=[missing, 4])
2×3 DataFrame
Row │ a b c
│ Int64? Int64 Int64?
─────┼─────────────────────────
1 │ 1 2 missing
2 │ missing 3 4

julia> findcols(x -> any(ismissing, x), df)
2-element Vector{Int64}:
1
3
```
"""
findcols(f::Function, df::AbstractDataFrame) =
findall(f, eachcol(df))
13 changes: 13 additions & 0 deletions test/dataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2387,4 +2387,17 @@ end
@test eltype(collect(p)) <: DataFrames.DataFrameRows
end

@testset "findcols" begin
df = DataFrame(a=[1, missing], b=[2, 3], c=[missing, 4])
@test findcols(x -> any(ismissing, x), df) == [1, 3]
@test findcols(x -> true, df) == [1, 2, 3]
@test findcols(x -> false, df) == Int[]
@test_throws TypeError findcols(x -> 1, df)

@test findcols(x -> any(ismissing, x), view(df, :, [1, 2])) == [1]
@test findcols(x -> true, view(df, :, [1, 2])) == [1, 2]
@test findcols(x -> false, view(df, :, [1, 2])) == Int[]
@test_throws TypeError findcols(x -> 1, view(df, :, [1, 2]))
end

end # module
Loading