create Accumulate iterator #25766

simonbyrne · 2018-01-26T22:24:14Z

This makes it possible to use the collect machinery for determining output types.

Also moves code out of multidimensional.jl (which is a bit of an odd place for it).

To do:

get slicing working.
array types for pairwise

stevengj · 2018-01-27T18:44:16Z

base/accumulate.jl

+rcum_promote_type(op, ::Type{Array{T,N}}) where {T,N} = Array{rcum_promote_type(op,T), N}
+
+# accumulate_pairwise slightly slower then accumulate, but more numerically
+# stable in certain situations (e.g. sums).


"stable" is the wrong word here. Use "accurate"

stevengj · 2018-01-27T18:45:33Z

base/accumulate.jl

+
+# accumulate_pairwise slightly slower then accumulate, but more numerically
+# stable in certain situations (e.g. sums).
+# it does double the number of operations compared to accumulate,


"double the number of op calls" would be more informative

stevengj · 2018-01-27T18:48:57Z

base/accumulate.jl

+end
+
+function cumsum!(out, v::AbstractVector, dim::Integer)
+    # we dispatch on the possibility of numerical stability issues


rephrase to "on the possibility of roundoff errors"

(Again, this is misusing the term "numerical stability". Even naive summation is backwards stable, it is just less accurate than pairwise summation.)

stevengj · 2018-01-27T18:56:16Z

base/accumulate.jl

+    itrstate, accval = accstate
+    val, itrstate = next(acc.iter, itrstate)
+    if accval === uninitialized
+        accval = reduce_first(acc.op, val)


So, this is type-unstable even for things like Accumulate(+, itr)?

Yes, but the new Union optimisations mean that there is no performance hit.

As I understand it, the Union optimizations help with cases like this, but retaining type stability will still be faster. Worth Nanosoldiering or otherwise benchmarking to be sure though.

In most cases, the only type instability will be in the first call to next, which is typically unrolled, so we should be able to avoid that.

jw3126 · 2018-01-28T09:45:17Z

There are type stability issues, if v0 is different from eltype(iter), (which is not uncommon in the wild):

julia> collect(Base.Accumulate(+, 1., Int[]))
0-element Array{Real,1}
julia> collect(Base.Accumulate(+, 1., Int[1]))
1-element Array{Float64,1}:
 2.0

This makes it possible to use the `collect` machinery for determining output types. Also moves code out of multidimensional.jl (which is a bit of an odd place for it). Still need to get slicing working.

simonbyrne · 2018-01-30T19:44:02Z

Okay, this should now fix some of the type stability issues.

In summary:

It fixes #25506, and generally makes all the promotion machinery work without falling back on promote_op (as per @martinholters suggestion).

I've also changed it so that cumsum/cumprod now have the same promotion behaviour as sum/prod. This was argued against previously (#18364 (comment)), but now that we use separate reduction operators, the old behaviour is still accessible via accumulate(+,x)/accumulate(*,x).

The main question is whether we want mapaccumulate (#21152)?

simonbyrne · 2018-01-30T19:46:38Z

@nanosoldier runbenchmarks(ALL, vs=":master")

ararslan · 2018-01-30T19:50:02Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2018-01-31T02:34:43Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

simonbyrne · 2018-01-31T05:05:08Z

Ah, the performance hit seems to be that copyto! doesn't unroll the first next.

simonbyrne · 2018-01-31T05:39:16Z

@nanosoldier runbenchmarks(ALL, vs=":master")

ararslan · 2018-01-31T05:41:06Z

@nanosoldier runbenchmarks(ALL, vs=":master")

martinholters · 2018-01-31T08:00:29Z

👍 to the general idea here. I would have loved to review this in more detail, but haven't had the time and it's unlikely that I will have today.

jw3126 · 2018-01-31T08:10:22Z

base/deprecated.jl

@@ -1407,6 +1407,8 @@ end

 @deprecate which(s::Symbol) which(Main, s)

+@deprecate accumulate!(op, dest::AbstractArray, args...) accumulate!(dest, op, args...)


Why dest first? Function arguments have highest priority.

good point.

jw3126 · 2018-01-31T08:14:51Z

test/arrayops.jl

+    @inferred accumulate(*, String[])
+    @test accumulate(*, ['a' 'b'; 'c' 'd'], 1) == ["a" "b"; "ac" "bd"]
+    @test accumulate(*, ['a' 'b'; 'c' 'd'], 2) == ["a" "ab"; "c" "cd"]
+end


Adding

@inferred accumulate(+, 1.0, Int[]) @test eltype(accumulate(+, 1.0, Int[])) == Float64

would be good. (Probably not in this testset.)

jw3126 · 2018-01-31T08:24:21Z

base/accumulate.jl

+
+function accumulate!(dest, op, v0, X, dim::Integer)
+    dim > 0 || throw(ArgumentError("dim must be a positive integer"))
+    axes(A) == axes(B) || throw(DimensionMismatch("shape of source and destination must match"))


What is A and B should be dest and X.

I think we also need more accumulate! tests. Historically accumulate! would be implicitly called anyway by each accumulate test so this was not necessary. AFAICT there is only a single accumulate! test now.

nanosoldier · 2018-01-31T11:10:14Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

martinholters · 2018-03-20T12:44:11Z

Looking only at the 1D case for now, I'm getting:

julia> A = rand(100);

julia> @btime Base.accumulate_pairwise(Base.add_sum, undef, $A, Base.HasShape{1}()); # this is what cumsum(A) does
  771.947 ns (9 allocations: 1.06 KiB)

julia> @btime Base._accumulate(Base.add_sum, undef, $A, Base.HasShape{1}(), nothing); # this uses the Accumulate iterator
  175.124 ns (2 allocations: 912 bytes)

For comparison, on master:

julia> @btime cumsum($A);
  147.019 ns (1 allocation: 896 bytes)

Using an element type which does not do pairwise summing:

julia> A = rand(Int, 100);

julia> @btime cumsum($A); # master
  126.825 ns (1 allocation: 896 bytes)

julia> @btime cumsum($A); # this PR
  125.654 ns (2 allocations: 912 bytes)

The overhead due to the Accumulate iterator apparently is not the problem. But the new accumulate_pairwise is. However, it has to be changed to give consistent results (in terms of return type) with the non-pairwise version, I suppose.

So...

Can we just make the API-breaking parts of this change and actually add the Accumulate iterator later?

Doesn't seem to gain us much.

stevengj · 2018-03-20T14:54:58Z

Can we just fix #25506 in the meantime?

jw3126 · 2018-03-20T16:28:09Z

@stevengj #25515 would fix just #25506. I closed in favor of this PR. Should I reopen it as a fix until this PR lands?

simonbyrne · 2018-03-20T16:38:14Z

how about I just keep the original pairwise stuff, since that seems to be the problem? We can always change that later.

The one thing to do in 0.7 is to change the small integer behaviour, since that is the main breaking change.

martinholters · 2018-03-20T20:11:41Z

This is remarkable:

julia> A=rand(100);

julia> @btime cumsum($A);
  944.120 ns (9 allocations: 1.06 KiB)

julia> Base._similar_for(c::AbstractArray, ::Type{T}, itr, ::Base.HasShape) where {T} = similar(c, T, axes(itr))

julia> @btime cumsum($A);
  666.361 ns (7 allocations: 1.02 KiB)

That's on a different machine than above, same cumsum on master is about 200ns here. So still quite a gap, but noting that _similar_for isn't changed in this PR, maybe there's something to be gained elsewhere, too. OTOH, IIUC, before this PR, that _similar_formethod is only called for EltypeUnknown, HasShape iterators which are not Genertors---which I don't think there are any in Base, or?

martinholters · 2018-03-20T20:59:24Z

Although everything is type-stable and inferable, if I disable the ifs in accumulate_pairwise by manual prediction for this case as in:

function accumulate_pairwise(op, v0, itr, ::Union{HasLength,HasShape{1}})
    i = start(itr)
    #if done(itr,i)
    #    return collect(Accumulate(op, v0, itr))
    #end
    v1,i = next(itr,i)
    y = reduce_first(op,v0,v1)

    Y = _similar_for(1:1, typeof(y), itr, IteratorSize(itr))
    L = linearindices(Y)
    n = length(L)
    j = first(L)
    while true
        Y[j] = y
        if done(itr,i)
            return Y
        end
        y,j,i,wider = _accumulate_pairwise!(op,Y,itr,y,j+1,i,last(L)-j,true)
        #if !wider
            return Y
        #end
        R = promote_typejoin(eltype(Y), typeof(y))
        newY = similar(Y, R)
        copyto!(newY,1,Y,1,j)
        Y = newY
    end
end

I'm getting

julia> @btime cumsum($A);
  197.452 ns (1 allocation: 896 bytes)

which is on par with master.

The while loop might be better written using a recursion, but I'm puzzled as to why the first if condition is also problematic.

martinholters · 2018-03-21T09:05:50Z

My theory for the effect of disabling the ifs is that it pushes the method across the inlining threshold. And then we suffer from the functions not being specialized on op. By forcing all relevant methods to be either inlined or specialized on op (i.e. (op::F, ...) where F), I obtain:

	master	this PR with specialization
`A = rand(100); @btime cumsum($A)`	154.226 ns (1 allocation: 896 bytes)	154.356 ns (1 allocation: 896 bytes)
`A = rand(10^6); @btime cumsum($A)`	1.249 ms (2 allocations: 7.63 MiB	1.284 ms (2 allocations: 7.63 MiB)

Yay! Unfortunately, that doesn't help the more-than-one-dimensional case. I'll look into that...

@simonbyrne, ok if I push my updates to your branch?

simonbyrne · 2018-03-21T16:50:41Z

Yes, certainly! I won't have time to look at this for a few days, but make what changes you see fit.

martinholters · 2018-03-22T14:10:34Z

Ok, I could substantially reduce the setup time for the multi-dimensional case and make it type-stable (if the input permits it):

julia> A = ones(1,1);

julia> @btime cumsum($A, dims=1);
  42.639 ns (1 allocation: 96 bytes)

julia> @btime cumsum($A, dims=2);
  42.780 ns (1 allocation: 96 bytes)

Compare with master:

julia> @btime cumsum($A, dims=1);
  33.831 ns (1 allocation: 96 bytes)

julia> @btime cumsum($A, dims=2);
  149.753 ns (4 allocations: 144 bytes)

Unfortunately, the time for the actual computation is still worse than on master by more than a factor of 2:

# this PR
julia> srand(0);

julia> A = rand(100,100);

julia> @btime cumsum($A, dims=1);
  24.235 μs (2 allocations: 78.20 KiB)

julia> @btime cumsum($A, dims=2);
  22.350 μs (2 allocations: 78.20 KiB)

#master
julia> @btime cumsum($A, dims=1);
  8.637 μs (2 allocations: 78.20 KiB)

julia> @btime cumsum($A, dims=2);
  9.056 μs (5 allocations: 78.25 KiB)

martinholters · 2018-03-26T08:46:05Z

Uh, no, this isn't it. While just iterating over CartesianIndices(axes(X)) is nice from a type-stability point of view, using the no-op loop

function _accumulate!(op::F, dest::AbstractArray{T}, v0, X, dim, inds, st, first_in_dim, dim_delta, widen) where {F,T}
    while !done(inds, st)
        i, st = next(inds, st)
    end
    return dest
end

is still slower than master, although it doesn't compute anything besides incrementing the index.

simonbyrne · 2018-03-26T23:24:48Z

I've managed to fix the performance of plain accumulate(+, X)

JeffBezanson · 2018-03-29T17:15:42Z

base/accumulate.jl

+    v0::V
+    iter::I
+end
+Accumulate(op, iter) = Accumulate(op, undef, iter) # use `undef` as a sentinel


This is exactly what I was afraid would happen with undef.

We can create another type to use as a sentinel: any suggestions for names? (I had originally used uninitialized, which makes slightly more sense).

JeffBezanson · 2018-03-29T18:45:46Z

I will repeat the request to do the small integer type change first, so we can do the rest of this any time.

mbauman · 2018-03-29T19:00:22Z

For what it's worth, here are the updated numbers on my spot-checks:

 # master/c2efb04eae                       |   # sb/accumulate/f0bb59dea5            
 julia> srand(0);                          |   julia> srand(0);                      
                                           |                                         
 julia> A = rand(100);                     |   julia> A = rand(100);                 
                                           |                                         
 julia> @btime cumsum($A);                 |   julia> @btime cumsum($A);             
   180.927 ns (1 allocation: 896 bytes)    |     189.005 ns (1 allocation: 896 bytes)
                                           |                                         
 julia> A = rand(10^6);                    |   julia> A = rand(10^6);                
                                           |                                         
 julia> @btime cumsum($A);                 |   julia> @btime cumsum($A);             
   1.463 ms (2 allocations: 7.63 MiB)      |     1.520 ms (2 allocations: 7.63 MiB)  
                                           |                                         
 julia> A = rand(10,10);                   |   julia> A = rand(10,10);               
                                           |                                         
 julia> @btime cumsum($A, dims=1);         |   julia> @btime cumsum($A, dims=1);     
   121.121 ns (1 allocation: 896 bytes)    |     285.664 ns (1 allocation: 896 bytes)
                                           |                                         
 julia> @btime cumsum($A, dims=2);         |   julia> @btime cumsum($A, dims=2);     
   259.957 ns (4 allocations: 944 bytes)   |     287.504 ns (1 allocation: 896 bytes)
                                           |                                         
 julia> A = rand(1000,1000);               |   julia> A = rand(1000,1000);           
                                           |                                         
 julia> @btime cumsum($A, dims=1);         |   julia> @btime cumsum($A, dims=1);     
   1.396 ms (2 allocations: 7.63 MiB)      |     3.046 ms (2 allocations: 7.63 MiB)  
                                           |                                         
 julia> @btime cumsum($A, dims=2);         |   julia> @btime cumsum($A, dims=2);     
   1.108 ms (5 allocations: 7.63 MiB)      |     2.553 ms (2 allocations: 7.63 MiB)

ararslan · 2018-03-30T04:41:52Z

@nanosoldier runbenchmarks(ALL, vs=":master")

StefanKarpinski · 2018-04-05T19:08:42Z

This doesn't need to be triaged anymore, right? Since after #26658 is merged this won't be breaking.

simonbyrne · 2018-10-23T23:42:38Z

Good test case:
https://discourse.julialang.org/t/weird-type-errors-when-manipulating-arrays-of-functions/16709

stevengj · 2020-03-01T02:53:17Z

Any update on this?

JeffBezanson · 2020-03-01T02:57:50Z

There is now an Accumulate iterator in Iterators, plus #34656. This could maybe be rebased to use that.

simonbyrne mentioned this pull request Jan 26, 2018

fix non numeric accumulate(op, v0, x) (#25506) #25515

Closed

simonbyrne force-pushed the sb/accumulate branch from 537efaf to d664815 Compare January 26, 2018 22:26

kshyatt added the iteration Involves iteration or the iteration protocol label Jan 27, 2018

stevengj reviewed Jan 27, 2018

View reviewed changes

simonbyrne force-pushed the sb/accumulate branch from d664815 to 520ed02 Compare January 30, 2018 19:34

simonbyrne added 2 commits January 30, 2018 11:34

WIP: create Accumulate iterator

6278b8e

This makes it possible to use the `collect` machinery for determining output types. Also moves code out of multidimensional.jl (which is a bit of an odd place for it). Still need to get slicing working.

finish revamp

8bd8cae

simonbyrne force-pushed the sb/accumulate branch from 520ed02 to 8bd8cae Compare January 30, 2018 19:34

simonbyrne changed the title ~~WIP: create Accumulate iterator~~ create Accumulate iterator Jan 30, 2018

unroll accumulate! to avoid type instability

7ab878e

jw3126 reviewed Jan 31, 2018

View reviewed changes

simonbyrne added 3 commits January 31, 2018 09:25

revert argument order change

3581f97

fix dimension check

512fbcd

Use destination type to determine output of cumsum! and cumprod!

c5f5d47

simonbyrne force-pushed the sb/accumulate branch from dc20816 to c5f5d47 Compare January 31, 2018 19:10

Merge remote-tracking branch 'origin/master' into sb/accumulate

5d0593a

Force specialization on op (or inlining) in accumulate machinery

e47a713

martinholters added 2 commits March 22, 2018 11:35

Make accumulate(..., dims=dim) type-stable (if possible)

9b7b8fb

Handle dim > ndims(A) seperately in accumulate(op, A, dims=dim)

57ed1b6

Merge remote-tracking branch 'origin/master' into sb/accumulate

f0c3774

change name of state variable in collect

f0bb59d

simonbyrne mentioned this pull request Mar 27, 2018

add benchmarks for accumulate and cumsum JuliaCI/BaseBenchmarks.jl#193

Merged

JeffBezanson reviewed Mar 29, 2018

View reviewed changes

simonbyrne mentioned this pull request Mar 29, 2018

Change accumulation promotion behaviour #26658

Merged

stevengj mentioned this pull request Apr 3, 2018

cumsum and friends don't support tuples #26690

Closed

JeffBezanson removed the triage This should be discussed on a triage call label Apr 5, 2018

abhinav3398 mentioned this pull request Oct 14, 2020

cumsum should accept functions and generic iterables #21150

Open

		@@ -1407,6 +1407,8 @@ end

		@deprecate which(s::Symbol) which(Main, s)

		@deprecate accumulate!(op, dest::AbstractArray, args...) accumulate!(dest, op, args...)

create Accumulate iterator #25766

Are you sure you want to change the base?

create Accumulate iterator #25766

Conversation

simonbyrne commented Jan 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj Jan 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jw3126 commented Jan 28, 2018 • edited Loading

simonbyrne commented Jan 30, 2018

simonbyrne commented Jan 30, 2018

ararslan commented Jan 30, 2018

nanosoldier commented Jan 31, 2018

simonbyrne commented Jan 31, 2018

simonbyrne commented Jan 31, 2018

ararslan commented Jan 31, 2018

martinholters commented Jan 31, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nanosoldier commented Jan 31, 2018

martinholters commented Mar 20, 2018

stevengj commented Mar 20, 2018

jw3126 commented Mar 20, 2018

simonbyrne commented Mar 20, 2018

martinholters commented Mar 20, 2018

martinholters commented Mar 20, 2018

martinholters commented Mar 21, 2018

simonbyrne commented Mar 21, 2018

martinholters commented Mar 22, 2018

martinholters commented Mar 26, 2018

simonbyrne commented Mar 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeffBezanson commented Mar 29, 2018

mbauman commented Mar 29, 2018

ararslan commented Mar 30, 2018

StefanKarpinski commented Apr 5, 2018

simonbyrne commented Oct 23, 2018

stevengj commented Mar 1, 2020

JeffBezanson commented Mar 1, 2020

simonbyrne commented Jan 26, 2018 •

edited

Loading

stevengj Jan 27, 2018 •

edited

Loading

jw3126 commented Jan 28, 2018 •

edited

Loading