poor performance #125

johnnychen94 · 2019-06-04T15:20:29Z

Actually, this surprises me a bit, any hints/ideas on how to fix this?

A similar result can be observed for *,.*, and etc.

julia> x_n0f8 = rand(N0f8, 1000, 1000);

julia> x_float64 = rand(Float64, 1000, 1000);

julia> x_uint8 = rand(UInt8, 1000, 1000);

julia> @benchmark x_n0f8 .+ x_n0f8
BenchmarkTools.Trial: 
  memory estimate:  976.84 KiB
  allocs estimate:  6
  --------------
  minimum time:     3.547 ms (0.00% GC)
  median time:      3.732 ms (0.00% GC)
  mean time:        3.838 ms (1.11% GC)
  maximum time:     7.032 ms (0.00% GC)
  --------------
  samples:          1301
  evals/sample:     1

julia> @benchmark x_float64 .+ x_float64
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  4
  --------------
  minimum time:     1.214 ms (0.00% GC)
  median time:      1.346 ms (0.00% GC)
  mean time:        1.869 ms (29.19% GC)
  maximum time:     5.726 ms (71.81% GC)
  --------------
  samples:          2667
  evals/sample:     1

julia> @benchmark x_uint8 .+ x_uint8
BenchmarkTools.Trial: 
  memory estimate:  976.75 KiB
  allocs estimate:  4
  --------------
  minimum time:     80.220 μs (0.00% GC)
  median time:      87.781 μs (0.00% GC)
  mean time:        235.548 μs (18.39% GC)
  maximum time:     1.844 ms (69.85% GC)
  --------------
  samples:          10000
  evals/sample:     1

The text was updated successfully, but these errors were encountered:

timholy · 2019-06-04T21:21:15Z

I am surprised too. Best approach would be to profile it and see where it's spending time. Or step into it with a debugger and see which version of + is getting called.

kimikage · 2019-10-11T12:18:11Z

One reason why x_float64 .+ x_float64 is faster, is that the SIMD optimization is relatively well carried out. However, I don't know yet why x_n0f8 .+ x_n0f8 is slow.

Although it is not directly related to the subject of this issue, I think * for Normed is inefficient since it converts the parameters into Float and converts the multiplied back to UInt.

FixedPointNumbers.jl/src/normed.jl

Line 93 in da39318

    
           *(x::T, y::T) where {T <: Normed} = convert(T,convert(floattype(T), x)*convert(floattype(T), y))

The following may be faster, although it should be generalized.

function Base.:*(x::N0f8, y::N0f8)
    m = UInt32(reinterpret(x)) * UInt32(reinterpret(y))
    reinterpret(N0f8, unsafe_trunc(UInt8, (m * 0x10101 + 0x800000)>>24))
end

Edit:
Historically, the float-based multiplication was introduced in PR #36. Of course, it is quite right as a bug fix.

kimikage · 2019-10-23T03:33:32Z

As mentioned in #129 (comment), this problem seems to be OS-dependent. Unfortunately, Julia v1.3.0 will not directly improve the performance.

It is not easy to modify the broadcasting mechanism. I think it may be a realistic idea to modify the
methods in FixedPointNumbers to be suitable for the broadcasting.

kimikage · 2019-10-29T06:51:09Z

Although also not directly related to the subject of this issue, the following tips are beneficial:
https://docs.julialang.org/en/v1.2/manual/performance-tips/#Pre-allocating-outputs-1
https://docs.julialang.org/en/v1.2/manual/performance-tips/#Consider-using-views-for-slices-1
https://docs.julialang.org/en/v1.2/manual/performance-tips/#Copying-data-is-not-always-bad-1

function prealloc_broadcast(f, a::T, b::T) where {T <: AbstractArray}
    y = copy(a)
    broadcast!(f, y, y, b)
end

function broadcast_with_view(f, a::T, b::T) where {T <: AbstractArray}
    broadcast(f, view(a, axes(a)...), view(b, axes(b)...))
end

function prealloc_broadcast_with_view(f, a::T, b::T) where {T <: AbstractArray}
    y = copy(a)
    broadcast!(f, y, y, view(b, axes(b)...))
end

julia> @benchmark broadcast(+, x_n0f8, x_n0f8)
BenchmarkTools.Trial:
  memory estimate:  976.80 KiB
  allocs estimate:  4
  --------------
  minimum time:     3.427 ms (0.00% GC)
  median time:      3.431 ms (0.00% GC)
  mean time:        3.520 ms (1.14% GC)
  maximum time:     32.200 ms (87.01% GC)
  --------------
  samples:          1419
  evals/sample:     1

julia> @benchmark prealloc_broadcast(+, x_n0f8, x_n0f8)
BenchmarkTools.Trial:
  memory estimate:  976.75 KiB
  allocs estimate:  3
  --------------
  minimum time:     2.505 ms (0.00% GC)
  median time:      2.510 ms (0.00% GC)
  mean time:        2.542 ms (1.13% GC)
  maximum time:     30.100 ms (91.65% GC)
  --------------
  samples:          1965
  evals/sample:     1

julia> @benchmark broadcast_with_view(+, x_n0f8, x_n0f8)
BenchmarkTools.Trial:
  memory estimate:  976.80 KiB
  allocs estimate:  4
  --------------
  minimum time:     2.238 ms (0.00% GC)
  median time:      2.244 ms (0.00% GC)
  mean time:        2.279 ms (1.42% GC)
  maximum time:     31.449 ms (92.86% GC)
  --------------
  samples:          2191
  evals/sample:     1

julia> @benchmark prealloc_broadcast_with_view(+, x_n0f8, x_n0f8)
BenchmarkTools.Trial:
  memory estimate:  976.75 KiB
  allocs estimate:  3
  --------------
  minimum time:     1.800 ms (0.00% GC)
  median time:      1.804 ms (0.00% GC)
  mean time:        1.833 ms (1.57% GC)
  maximum time:     31.257 ms (94.21% GC)
  --------------
  samples:          2723
  evals/sample:     1

kimikage · 2019-10-30T16:11:21Z

I found the significant difference between x_n0f8 and x_uint8.

julia> typeof(x_n0f8)
Base.ReinterpretArray{Normed{UInt8,8},2,UInt8,Array{UInt8,2}}

julia> typeof(x_uint8)
Array{UInt8,2}

So, as you already know...

julia> z_n0f8 = replace!(_->rand(N0f8), Array{N0f8}(undef, 1000, 1000));

julia> typeof(z_n0f8)
Array{Normed{UInt8,8},2}

julia> @benchmark z_n0f8 .+ z_n0f8
BenchmarkTools.Trial:
  memory estimate:  976.75 KiB
  allocs estimate:  4
  --------------
  minimum time:     76.600 μs (0.00% GC)
  median time:      85.300 μs (0.00% GC)
  mean time:        105.110 μs (11.17% GC)
  maximum time:     30.517 ms (99.58% GC)
  --------------
  samples:          10000
  evals/sample:     1

Are the ReinterpretArrays of primitive types designed to be fast or faster than the native Arrays? If so, this is probably a problem with Julia language. If not, we should be modify the rand.

FixedPointNumbers.jl/src/FixedPointNumbers.jl

Line 210 in da39318

    
           rand(::Type{T}, sz::Dims) where {T <: FixedPoint} = reinterpret(T, rand(rawtype(T), sz))

timholy · 2019-10-30T18:29:47Z

Yes, this is a limitation of ReinterpretArray. See JuliaLang/julia#28980.

A (nasty) workaround is to steal the pointer and create a new array with unsafe_wrap.

johnnychen94 mentioned this issue Oct 11, 2019

[RFC] Whether RGB <-> XYZ conversion should be more faster or be kept simple JuliaGraphics/Colors.jl#351

Closed

kimikage mentioned this issue Oct 20, 2019

Rounding errors in Normed to Float conversions #129

Closed

kimikage mentioned this issue Oct 31, 2019

Improve accuracy of conversions from floating-point numbers (Fixes #102) #131

Merged

This was referenced Nov 17, 2019

Improve accuracy of conversions to floating-point numbers (#129) #138

Merged

Road to 1.0 #142

Open

kimikage mentioned this issue Dec 4, 2019

[RFC/Coffee Break] What about a new type NormedFloat? #147

Closed

This was referenced Jan 19, 2020

[RFC] Promotion rules for overflow reduction #167

Closed

Optimizing multiplication for Normed #174

Closed

kimikage mentioned this issue Jul 8, 2020

[RFC] Use of Random API #196

Closed

kimikage mentioned this issue Jul 24, 2020

Use sampler-based Random API #206

Merged

kimikage closed this as completed in #206 Aug 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poor performance #125

poor performance #125

johnnychen94 commented Jun 4, 2019 •

edited

Loading

timholy commented Jun 4, 2019

kimikage commented Oct 11, 2019 •

edited

Loading

kimikage commented Oct 23, 2019

kimikage commented Oct 29, 2019 •

edited

Loading

kimikage commented Oct 30, 2019

timholy commented Oct 30, 2019

poor performance #125

poor performance #125

Comments

johnnychen94 commented Jun 4, 2019 • edited Loading

timholy commented Jun 4, 2019

kimikage commented Oct 11, 2019 • edited Loading

kimikage commented Oct 23, 2019

kimikage commented Oct 29, 2019 • edited Loading

kimikage commented Oct 30, 2019

timholy commented Oct 30, 2019

johnnychen94 commented Jun 4, 2019 •

edited

Loading

kimikage commented Oct 11, 2019 •

edited

Loading

kimikage commented Oct 29, 2019 •

edited

Loading