Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

30x slower looping over reinterpret array #51658

Open
Moelf opened this issue Oct 10, 2023 · 3 comments
Open

30x slower looping over reinterpret array #51658

Moelf opened this issue Oct 10, 2023 · 3 comments
Labels
performance Must go faster

Comments

@Moelf
Copy link
Sponsor Contributor

Moelf commented Oct 10, 2023

previous saga:


julia> _from_zigzag(n) = (n >> one(n))  -(n & one(n))
_from_zigzag (generic function with 1 method)

julia> function g(res)
           @simd for i in eachindex(res)
               res[i] = _from_zigzag(res[i])
           end
       end
g (generic function with 1 method)

julia> ARY = reinterpret(Int16, rand(UInt8, 10^5));

julia> using BenchmarkTools

julia> @benchmark g(x) setup=begin x = copy(ARY) end
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min  max):  1.281 μs   3.130 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.290 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.304 μs ± 64.783 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▇▆▅▄▃▂▂▁▁                       ▂▂▁                      ▂
  ██████████████▇▆▆▅▅▅▁▁▁▃▃▁▃▁▁▁▁▁▅█████▇▆▁▅▄▄▃▅▃▄▄▄▄▃▃▄▄▃▄▅ █
  1.28 μs      Histogram: log(frequency) by time     1.52 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark g(x) setup=begin x = deepcopy(ARY) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  322.640 μs  481.445 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     324.080 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   325.580 μs ±   6.105 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂██▆▆▅▃▂▂                                                     ▂
  ███████████▇▇▇▆▆▆▆▆▆▅▆▆▅▅▆▆▆▆▇▇▅▆▅▅▆▄▅▅▄▅▅▅▄▃▄▅▄▄▅▂▄▄▅▅▅▄▃▄▅▅ █
  323 μs        Histogram: log(frequency) by time        358 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

that's a 300x slowdown?

notice, it's faster to first copy...

julia> @benchmark g(copy(x)) setup=begin x = ARY end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  29.527 μs  444.092 μs  ┊ GC (min  max): 0.00%  84.11%
 Time  (median):     32.452 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   33.225 μs ±  10.909 μs  ┊ GC (mean ± σ):  0.90% ±  2.60%

         ▄█▅▅▆▁
  ▂▃▅▆▄▅▇██████▅▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  29.5 μs         Histogram: frequency by time         48.3 μs <

 Memory estimate: 97.73 KiB, allocs estimate: 2.
@Moelf Moelf changed the title 60x slower looping over reinterpret array 30x slower looping over reinterpret array Oct 10, 2023
@jishnub
Copy link
Contributor

jishnub commented Oct 10, 2023

Perhaps #44186 might help?

@Moelf
Copy link
Sponsor Contributor Author

Moelf commented Oct 10, 2023

looks like it helps by almos 10x

julia> @benchmark g(x) setup=begin x = copy(ARY) end
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min  max):  1.279 μs    4.555 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.294 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.367 μs ± 164.712 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅▂      ▇▄                                                 ▁
  ████▆▄▃▄▄███▇▇▇▇▇▆▆▅▄▅▄▅▆▄▅▆▅▃▅▄▅▄▅▅▅▅▄▄▃▅▄▃▅▃▅▄▄▅▅▅▄▃▄▄▃▄▅ █
  1.28 μs      Histogram: log(frequency) by time      2.17 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark g(x) setup=begin x = deepcopy(ARY) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  43.471 μs  79.852 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     44.035 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.352 μs ±  3.060 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇█▆▅▃▂     ▁▁▁▁▂▅▅▃▂▁▁▁▁                                    ▂
  █████████████████████████▇▇▇▇▇▆▇▆▆▆▆▅▅▅▅▅▅▃▄▄▃▄▃▃▁▅▄▄▁▄▄▄▄▆ █
  43.5 μs      Histogram: log(frequency) by time      60.9 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

@Moelf
Copy link
Sponsor Contributor Author

Moelf commented Mar 6, 2024

on nightly it's still the same:

julia> @benchmark g(x) setup=begin x = copy(ARY) end
BenchmarkTools.Trial: 7593 samples with 10 evaluations.
 Range (min  max):  1.280 μs    5.617 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.288 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.513 μs ± 655.006 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █ ▂▁
  ████▇▇▆▆▅▄▄▄▃▃▄▃▃▃▆▆▄▅▄▆▇▇▆▇▇▇█▆▇▆▇▆▆▆▆▆▇▆▆▆▆▆▆▆▅▆▅▆▅▅▅▄▅▅▅ █
  1.28 μs      Histogram: log(frequency) by time      4.32 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark g(x) setup=begin x = deepcopy(ARY) end
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  42.522 μs  87.246 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     42.767 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   43.293 μs ±  2.071 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▇▃▂▂▁                              ▃▃                     ▁
  ███████▇█▆▅▅▄▄▃▄▃▅▄▅▆▆▅▆▅▅▅▆▆▆▆▅▆▄▅▅▇███▅▆▅▅▅▅▄▅▄▄▄▃▄▄▄▄▄▃▃ █
  42.5 μs      Histogram: log(frequency) by time      49.9 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants