-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: use pairwise summation for sum, cumsum, and cumprod #4039
Conversation
(We could use the old definitions for |
If you do image processing with single precision, this is an enormously big deal. E.g. julia> mean(fill(1.5f0, 3000, 4000))
1.766983f0 (Example chosen for its striking effect. Unfortunately the mean implementation is disjoint from the sum implementation that this patch modifies.) |
@GunnarFarneback, yes, it is a pain to modify all of the different
|
Of course, in single precision, the right thing is to just do summation and other accumulation in double precision, rounding back to single precision at the end. |
To clarify my example was meant show the insufficiency of the current state of summing. That pairwise summation is an effective solution I considered as obvious, so I'm all in favor of this patch. That mean and some other functions fail to take advantage of it is a separate issue. |
The right thing with respect to precision, not necessarily with respect to speed. |
I'm good with this as a first step and think we should, as @GunnarFarneback points out, integrate even further so that mean and other stats functions also use pairwise summation. |
…omplex arrays (should use absolute value and return a real number)
Updated to use pairwise summation for The variance computation is also more efficient now because (at least when operating on the whole array) it no longer constructs a temporary array of the same size. (Would be even better if Also, I noticed that |
…rted) mapreduce_associative
Also added an associative variant of _Question:_ Although |
Another approach would be to have a |
That would be a good idea--for instance, https://github.com/pao/Monads.jl/blob/master/src/Monads.jl#L54 relies on the fold direction. |
The only sensible associativities to have in Base are left, right, and unspecified. A My suggestion would be that |
@StefanKarpinski, should I go ahead and merge this? |
This is great, and I would love to see this merged. Also, we should probably not export |
|
I'm good with merging this. @JeffBezanson? |
RFC: use pairwise summation for sum, cumsum, and cumprod
This is great. I had played with my own variant of this idea, breaking it up into blocks of size |
Stdlib: Pkg URL: https://github.com/JuliaLang/Pkg.jl.git Stdlib branch: master Julia branch: master Old commit: 7b759d7f0 New commit: d84a1a38b Julia version: 1.12.0-DEV Pkg version: 1.12.0 Bump invoked by: @KristofferC Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaLang/Pkg.jl@7b759d7...d84a1a3 ``` $ git log --oneline 7b759d7f0..d84a1a38b d84a1a38b Allow use of a url and subdir in [sources] (#4039) cd75456a8 Fix heading (#4102) b61066120 rename FORMER_STDLIBS -> UPGRADABLE_STDLIBS (#4070) 814949ed2 Increase version of `StaticArrays` in `why` tests (#4077) 83e13631e Run CI on backport branch too (#4094) ``` Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
This patch modifies the
sum
andcumsum
functions (andcumprod
) to use pairwise summation for summingAbstractArray
s, in order to achieve much better accuracy at a negligible computational cost.Pairwise summation recursively divides the array in half, sums the halves recursively, and then adds the two sums. As long as the base case is large enough (here, n=128 seemed to suffice), the overhead of the recursion is negligible compared to naive summation (a simple loop). The advantage of this algorithm is that it achieves O(sqrt(log n)) mean error growth, versus O(sqrt(n)) for naive summation, which is almost indistinguishable from the O(1) error growth of Kahan compensated summation.
For example:
where
oldsum
andnewsum
are the old and new implementations, respectively, gives(-1.2233732622777777e-13,0.0)
on my machine in a typical run: the oldsum
loses three significant digits, whereas the newsum
loses none. On the other hand, their execution time is nearly identical:gives
(The difference is within the noise.)
@JeffBezanson and @StefanKarpinski, I think I mentioned this possibility to you at Berkeley.