-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
poor performance #125
Comments
I am surprised too. Best approach would be to profile it and see where it's spending time. Or step into it with a debugger and see which version of |
One reason why Although it is not directly related to the subject of this issue, I think FixedPointNumbers.jl/src/normed.jl Line 93 in da39318
The following may be faster, although it should be generalized. function Base.:*(x::N0f8, y::N0f8)
m = UInt32(reinterpret(x)) * UInt32(reinterpret(y))
reinterpret(N0f8, unsafe_trunc(UInt8, (m * 0x10101 + 0x800000)>>24))
end Edit: |
As mentioned in #129 (comment), this problem seems to be OS-dependent. Unfortunately, Julia v1.3.0 will not directly improve the performance. It is not easy to modify the broadcasting mechanism. I think it may be a realistic idea to modify the |
Although also not directly related to the subject of this issue, the following tips are beneficial: function prealloc_broadcast(f, a::T, b::T) where {T <: AbstractArray}
y = copy(a)
broadcast!(f, y, y, b)
end
function broadcast_with_view(f, a::T, b::T) where {T <: AbstractArray}
broadcast(f, view(a, axes(a)...), view(b, axes(b)...))
end
function prealloc_broadcast_with_view(f, a::T, b::T) where {T <: AbstractArray}
y = copy(a)
broadcast!(f, y, y, view(b, axes(b)...))
end julia> @benchmark broadcast(+, x_n0f8, x_n0f8)
BenchmarkTools.Trial:
memory estimate: 976.80 KiB
allocs estimate: 4
--------------
minimum time: 3.427 ms (0.00% GC)
median time: 3.431 ms (0.00% GC)
mean time: 3.520 ms (1.14% GC)
maximum time: 32.200 ms (87.01% GC)
--------------
samples: 1419
evals/sample: 1
julia> @benchmark prealloc_broadcast(+, x_n0f8, x_n0f8)
BenchmarkTools.Trial:
memory estimate: 976.75 KiB
allocs estimate: 3
--------------
minimum time: 2.505 ms (0.00% GC)
median time: 2.510 ms (0.00% GC)
mean time: 2.542 ms (1.13% GC)
maximum time: 30.100 ms (91.65% GC)
--------------
samples: 1965
evals/sample: 1
julia> @benchmark broadcast_with_view(+, x_n0f8, x_n0f8)
BenchmarkTools.Trial:
memory estimate: 976.80 KiB
allocs estimate: 4
--------------
minimum time: 2.238 ms (0.00% GC)
median time: 2.244 ms (0.00% GC)
mean time: 2.279 ms (1.42% GC)
maximum time: 31.449 ms (92.86% GC)
--------------
samples: 2191
evals/sample: 1
julia> @benchmark prealloc_broadcast_with_view(+, x_n0f8, x_n0f8)
BenchmarkTools.Trial:
memory estimate: 976.75 KiB
allocs estimate: 3
--------------
minimum time: 1.800 ms (0.00% GC)
median time: 1.804 ms (0.00% GC)
mean time: 1.833 ms (1.57% GC)
maximum time: 31.257 ms (94.21% GC)
--------------
samples: 2723
evals/sample: 1 |
I found the significant difference between julia> typeof(x_n0f8)
Base.ReinterpretArray{Normed{UInt8,8},2,UInt8,Array{UInt8,2}}
julia> typeof(x_uint8)
Array{UInt8,2} So, as you already know... julia> z_n0f8 = replace!(_->rand(N0f8), Array{N0f8}(undef, 1000, 1000));
julia> typeof(z_n0f8)
Array{Normed{UInt8,8},2}
julia> @benchmark z_n0f8 .+ z_n0f8
BenchmarkTools.Trial:
memory estimate: 976.75 KiB
allocs estimate: 4
--------------
minimum time: 76.600 μs (0.00% GC)
median time: 85.300 μs (0.00% GC)
mean time: 105.110 μs (11.17% GC)
maximum time: 30.517 ms (99.58% GC)
--------------
samples: 10000
evals/sample: 1 Are the FixedPointNumbers.jl/src/FixedPointNumbers.jl Line 210 in da39318
|
Yes, this is a limitation of ReinterpretArray. See JuliaLang/julia#28980. A (nasty) workaround is to steal the pointer and create a new array with |
Actually, this surprises me a bit, any hints/ideas on how to fix this?
A similar result can be observed for
*
,.*
, and etc.The text was updated successfully, but these errors were encountered: