-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mapreduce with multiple arrays allocates #53417
Comments
looks like in some cases "bad allocation" case is actually faster: julia> function manual(x,y)
s = zero(eltype(x))
for (i,j) in zip(x,y)
s += i*j
end
return s
end
manual (generic function with 1 method)
julia> @benchmark manual(x, y) setup=(x=rand(100,100); y=rand(100,100))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 9.783 μs … 31.469 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.813 μs ┊ GC (median): 0.00%
Time (mean ± σ): 9.928 μs ± 677.873 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
██▄▄▃▃▂▁▁▁ ▁▁ ▁▂▁ ▁
███████████▇▇▇▇▇▇▇█▇█████▇▇▅▅▅▅▃▃▄▄▃▄▅███▇▆▆▅▅▅▅▃▄▄▃▃▄▄▄▄▄▂ █
9.78 μs Histogram: log(frequency) by time 11 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark mapreduce(*, +, x, y) setup=(x=rand(100,100); y=rand(100,100))
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
Range (min … max): 5.445 μs … 56.432 μs ┊ GC (min … max): 0.00% … 69.95%
Time (median): 6.132 μs ┊ GC (median): 0.00%
Time (mean ± σ): 6.592 μs ± 3.665 μs ┊ GC (mean ± σ): 5.15% ± 7.96%
▅█▂▁ ▁
█████▇█▇▇▆▅▅▆▅▅▃▃▃▁▃▁▁▁▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▅ █
5.44 μs Histogram: log(frequency) by time 38 μs <
Memory estimate: 78.17 KiB, allocs estimate: 2. |
Yeah but my GPU runs out of memory (gigabyte large arrays). |
Not sure if your benchmarking is fair. It looks like it's not SIMD... julia> function manual(x,y)
s = zero(eltype(x))
@simd for i in 1:length(x)
@inbounds s += x[i] * y[i]
end
return s
end
manual (generic function with 1 method)
julia> @benchmark manual(x,y) setup=(x=rand(100, 100); y=rand(100, 100))
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.406 μs … 4.790 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.559 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.616 μs ± 234.085 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃ ▃▁ ▂▃▁
▃██▅▇███████▇▆▅▅▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
1.41 μs Histogram: frequency by time 2.38 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark mapreduce(*, +, x,y) setup=(x=rand(100, 100); y=rand(100, 100))
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
Range (min … max): 7.389 μs … 282.207 μs ┊ GC (min … max): 0.00% … 87.14%
Time (median): 9.740 μs ┊ GC (median): 0.00%
Time (mean ± σ): 11.754 μs ± 14.580 μs ┊ GC (mean ± σ): 7.31% ± 5.95%
▇▂█
▇████▆▆▆▅▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
7.39 μs Histogram: frequency by time 47.2 μs <
Memory estimate: 78.17 KiB, allocs estimate: 2.
|
I found a way to express this with julia> foldf = (acc, t) -> acc[1] + t[1] * t[2]
#5 (generic function with 1 method)
julia> @time foldl(foldf, zip(x, y))
0.000031 seconds (5 allocations: 144 bytes)
2498.8165946524705
julia> f(x,y) = x * y
f (generic function with 1 method)
julia> @time mapreduce(f, +, x, y)
0.000059 seconds (6 allocations: 78.281 KiB)
2498.794164503838 |
Not sure if it covers all the use cases but maybe change this Lines 329 to 330 in 4e72944
to
The lazy map won't allocate, as promised in the documentation. The manual version still seems faster though. |
Should fix JuliaLang#53417, seeing if it passes tests.
Just to confirm, on CUDA.jl this indeed is allocation free: julia> x = CuArray(rand(512, 512));
julia> y = copy(x);
julia> f(x,y) = x*y
f (generic function with 1 method)
julia> CUDA.@time mapreduce(f, +, x,y);
7.071664 seconds (14.90 M CPU allocations: 1010.226 MiB, 3.82% gc time) (2 GPU allocations: 232 bytes, 0.00% memmgmt time)
julia> CUDA.@time mapreduce(f, +, x,y);
0.000195 seconds (62 CPU allocations: 2.578 KiB) (2 GPU allocations: 232 bytes, 9.79% memmgmt time)
julia> xc = Array(x);
julia> yc = Array(y);
julia> @time mapreduce(f, +, xc,yc);
0.122801 seconds (328.86 k allocations: 24.198 MiB, 5.45% gc time, 98.93% compilation time)
julia> @time mapreduce(f, +, xc,yc);
0.001084 seconds (6 allocations: 2.000 MiB)
|
Hi,
With Julia 1.10.1 I noticed that
mapreduce
with two arrays allocates:Looking at the implementation there is no specialization.
Maybe we should adapt the over promising docstring then?
Related to #38558 ?
Best,
Felix
The text was updated successfully, but these errors were encountered: