Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add init argument to count #37461

Merged
merged 3 commits into from
Oct 29, 2020
Merged

add init argument to count #37461

merged 3 commits into from
Oct 29, 2020

Conversation

simeonschaub
Copy link
Member

Inspired by this StackOverflow answer. Unfortunately, count! also already takes an init argument, which has a different meaning, so this will be the same situation as we currently have with sum/sum!, see #36266.

for x in itr
n += pred(x)::Bool
end
return n
end

function count(::typeof(identity), x::Array{Bool})
n = 0
function _simple_count(::typeof(identity), x::Array{Bool}, init::T=0) where {T}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a bugfix (although it doesn't occur in 1.5), because count(::Array{Bool}) currently errors when passed a dims argument.

@nalimilan
Copy link
Member

nalimilan commented Sep 9, 2020

Interesting idea. Though I wonder whether the speedup obtained by using a smaller integer type isn't just due to a compiler bug in this particular case: see #37473. I don't see any speedup when counting non-missing values or the number of zeros:

julia> function count_nonmissing(x, T)
           c = zero(T)
           for i in 1:length(x)
               c += @inbounds x[i] !== missing
           end
           return Int(c)
       end
count_nonmissing (generic function with 1 method)

julia> x = rand([missing, rand(Int, 100)...], 1_000_000);

julia> @btime count_nonmissing(x, Int);
  25.500 ms (999485 allocations: 15.25 MiB)

julia> @btime count_nonmissing(x, Int16);
  27.453 ms (983978 allocations: 15.01 MiB)

julia> function count_zero(x, T)
           c = zero(T)
           for i in 1:length(x)
               c += @inbounds x[i] == 0
           end
           return Int(c)
       end

julia> x = rand(1_000_000);

julia> @btime count_zero(x, Int);
  21.660 ms (0 allocations: 0 bytes)

julia> @btime count_zero(x, Int16);
  19.200 ms (0 allocations: 0 bytes)

@simeonschaub
Copy link
Member Author

Hmm, interesting! I would still argue that this is worth it for the consistency alone. It still seems useful to be able to control the accumulation type, perhaps someone wants to count using finite fields, or explicitely needs 64 bits of accuracy on 32-bit platforms.

@simeonschaub
Copy link
Member Author

I don't know who would be best to review this. @tkf, would you be willing to take a quick look at this, since I think you worked a lot on our reduce functions lately?

base/reduce.jl Show resolved Hide resolved
base/bitarray.jl Show resolved Hide resolved
@tkf tkf added the fold sum, maximum, reduce, foldl, etc. label Sep 12, 2020
@tkf
Copy link
Member

tkf commented Sep 12, 2020

For now, I guess you can just use sum instead of count?

TBH, after reading #35947 (comment), I've been wondering how much we should be doing for supporting count. Should it even be a target for deprecation? Likely not, but it could be, depending on what goes in Julia 2.0.

@simeonschaub
Copy link
Member Author

simeonschaub commented Sep 12, 2020

I think currently count still has the advantage of being optimized for BitArrays and Array{Bool}, but I don't see why we couldn't apply this optimization to the more general mapreduce case. I guess count still has some merit over sum purely semantically, because "counting" logical values feels more natural than "summing" logical values, but I agree that it otherwise really doesn't add much over sum. A hard deprecation would probably still be a bit out of scope for 1.x, since I'd imagined it's used quite a lot.

@tkf
Copy link
Member

tkf commented Sep 12, 2020

being optimized for BitArrays and Array{Bool}

Isn't it better to use these methods from sum as well?

count still has some merit over sum

I feel this is, in principle, minor than assigning different semantics to count as discussed in #35947. Having said that, now that we are in the post-1.0 era, I still don't think such minor renaming should happen, even for 2.0.

@StefanKarpinski
Copy link
Member

Bump?

@simeonschaub
Copy link
Member Author

@tkf Are you fine with this as-is, or do you still think, init should be restricted to Integer? Other than this, I think this should be good to go.

test/reduce.jl Outdated Show resolved Hide resolved
base/reduce.jl Outdated Show resolved Hide resolved
@simeonschaub
Copy link
Member Author

Bump.

@simeonschaub
Copy link
Member Author

@rfourquet Would you mind taking another look at this before the branching for 1.6 happens, because count(::Array{Bool}, dims=...) is still broken currently?

@rfourquet
Copy link
Member

Would you mind taking another look

Oh sorry, I had missed your previous review request. I will have a look later today.

Copy link
Member

@rfourquet rfourquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just the docstring needs to document init.

base/reduce.jl Outdated Show resolved Hide resolved
base/reduce.jl Outdated Show resolved Hide resolved
base/reduce.jl Outdated Show resolved Hide resolved
Co-authored-by: Rafael Fourquet <fourquet.rafael@gmail.com>
@simeonschaub
Copy link
Member Author

Thank you very much for the thorough review! Hope I addressed it all.

Copy link
Member

@rfourquet rfourquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more nitpicks :)

base/reduce.jl Outdated Show resolved Hide resolved
base/reduce.jl Outdated Show resolved Hide resolved
base/reduce.jl Outdated Show resolved Hide resolved
base/bitarray.jl Outdated Show resolved Hide resolved
base/reduce.jl Outdated Show resolved Hide resolved
Co-authored-by: Rafael Fourquet <fourquet.rafael@gmail.com>
@simeonschaub
Copy link
Member Author

Is this good to go?

@rfourquet
Copy link
Member

I will merge in a couple of days if no objection :)

@vtjnash vtjnash merged commit 506fbdf into JuliaLang:master Oct 29, 2020
@simeonschaub simeonschaub deleted the count_init branch October 29, 2020 19:09
@tlienart tlienart mentioned this pull request Nov 2, 2020
ViralBShah added a commit that referenced this pull request Nov 10, 2020
There's also a bunch of issue links that are missing (#37410, #37247, #37540, #37973, #37461, #37753) but it seems there's a script that generates the links so I'm assuming that will be fixed automatically.

Co-authored-by: Viral B. Shah <ViralBShah@users.noreply.github.com>
achuchmala pushed a commit to achuchmala/julia that referenced this pull request Nov 11, 2020
There's also a bunch of issue links that are missing (JuliaLang#37410, JuliaLang#37247, JuliaLang#37540, JuliaLang#37973, JuliaLang#37461, JuliaLang#37753) but it seems there's a script that generates the links so I'm assuming that will be fixed automatically.

Co-authored-by: Viral B. Shah <ViralBShah@users.noreply.github.com>
simeonschaub added a commit that referenced this pull request Jan 8, 2021
Not sure why this worked before #37461, perhaps #9498?
simeonschaub added a commit that referenced this pull request Jan 8, 2021
Not sure why this worked before #37461, perhaps #9498?
simeonschaub added a commit that referenced this pull request Jan 8, 2021
Not sure why this worked before #37461, perhaps #9498?
KristofferC pushed a commit that referenced this pull request Jan 9, 2021
Not sure why this worked before #37461, perhaps #9498?

(cherry picked from commit ca07546)
KristofferC pushed a commit that referenced this pull request Feb 1, 2021
Not sure why this worked before #37461, perhaps #9498?

(cherry picked from commit ca07546)
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
staticfloat pushed a commit that referenced this pull request Dec 23, 2022
Not sure why this worked before #37461, perhaps #9498?

(cherry picked from commit ca07546)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fold sum, maximum, reduce, foldl, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants