Fix the performance of `reinterpretarray` with simultaneous reshaping #37559

timholy · 2020-09-13T19:56:54Z

This introduces reinterpret(reshape, T, A). This adds a dimension when reinterpreting to a smaller element size (e.g., Tuple{Int,Int}->Int), removes a dimension when reinterpreting to a bigger element size, and keeps the dimensionality constant when the element size is unchanged. Here's a demo:

julia> a = [(1,2), (3,4)]
2-element Vector{Tuple{Int64, Int64}}:
 (1, 2)
 (3, 4)

julia> reinterpret(reshape, Int, a)       # bigger elsize to smaller elsize, adds a dimension
2×2 reinterpret(reshape, Int64, ::Vector{Tuple{Int64, Int64}}):
 1  3
 2  4

julia> b = [1 2; 3 4]
2×2 Matrix{Int64}:
 1  2
 3  4

julia> reinterpret(reshape, Tuple{Int,Int}, b)    # smaller elsize to bigger elsize, removes a dimension
2-element reinterpret(reshape, Tuple{Int64, Int64}, ::Matrix{Int64}):
 (1, 3)
 (2, 4)

julia> c = [1, -1]
2-element Vector{Int64}:
  1
 -1

julia> reinterpret(reshape, UInt, c) # same elsize, no change in dimensionality
2-element reinterpret(reshape, UInt64, ::Vector{Int64}):
 0x0000000000000001
 0xffffffffffffffff

Obviously, the first dimension must have size commensurate with the difference in element size.

This scenario is widely used:

it's used when transitioning between a Vector of SVectors and a matrix where each vector is on a column
it's used in splitting and combining color channels in image processing

Unfortunately, since Julia 1.0 the performance of ReinterpretArray has been quite poor in these circumstances. For those who want benchmarks, see JuliaImages/ImageCore.jl#142; the short answer is that the improvements are dramatic, and come close to erasing any difference between the performance of a copy and the view itself (in some cases, even providing performance improvements). The trick here is that LLVM knows the size of the first dimension and so can often unroll the inner loop.

This replaces the strategy I was aiming for in #37290 by reshaping the array. I definitely think this is a better solution (thanks as always @mbauman, who has a gift for brief but incredibly useful suggestions). This is split into two commits: the first is pretty straightforward and, I think, relatively risk-free. The second is much scarier: it attempts to retain the performance advantages of linear indexing when the parent array supports it, but introduces a new AbstractCartesianIndex subtype. I think I've prevented packages from getting borked by not supporting this index type (see the new WrappedArray tests), but there's no doubt that this is the part that deserves the closest scrutiny.

Fixes #28980
Closes #37290

timholy · 2020-09-13T19:59:33Z

CC @johnnychen94, @RalphAS

timholy · 2020-09-15T10:49:27Z

base/reinterpretarray.jl

+end
+IndexStyle(::IndexSCartesian2{K,N}, ::IndexSCartesian2{K,N}) where {K,N} = IndexSCartesian2{K,N}()
+
+struct SCartesianIndex2{K,N} <: AbstractCartesianIndex{N}


I'm unsure about the N parameter here. It felt slightly weird to have it be a subtype of generic AbstractCartesianIndex, and this does make it clear how many dimensions the particular SCartesianIndex2 is standing in for. The downside is that this will force a lot of completely useless specialization.

timholy · 2020-09-25T12:31:08Z

I'll probably merge this Monday. Any thoughts? I'm especially interested in feedback on #37559 (comment).

This addresses longstanding performance problems with `reinterpret` when `sizeof(eltype(a))` is an integer multiple of `sizeof(T)`. By reshaping the array to have an extra "channel dimension," LLVM can unroll the inner loop thanks to static size information. Conversely, this consumes the initial "channel dimension" if `sizeof(T)` is an integer multiple of `sizeof(eltype(a))`.

timholy · 2020-09-29T12:40:18Z

I decided to get rid of the non-functional type parameter. I'll merge this once it gets through CI.

In order to make this a subtype of `AbstractCartesianIndex`, formerly this added a useless type parameter `N` to indicate the dimensionality of the array for which this index was constructed. But since none of the code depends on `N`, and it would have forced useless specialization, it seems better to not have it.

timholy · 2020-09-29T18:22:46Z

I decided to get rid of the non-functional type parameter. When we merge, though, let's not squash, so we can simply revert the 3rd commit if folks think that was a bad idea.

That's a lot of failures, but as far as I can tell they have nothing to do with this PR since several of my PRs are showing somewhat similar errors. OK to merge? EDIT: looks related to #37731 and with overlaps in other failures I see elsewhere.

This was added in JuliaLang/julia#37559 and is currently not handled correctly by Adapt.

timholy mentioned this pull request Sep 13, 2020

More benchmarks JuliaImages/ImageCore.jl#142

Closed

timholy mentioned this pull request Sep 13, 2020

consistent colorview and channelview results for Color1 and Color3 JuliaImages/ImageCore.jl#97

Closed

timholy commented Sep 15, 2020

View reviewed changes

timholy requested review from mbauman and Keno September 15, 2020 10:49

timholy force-pushed the teh/reinterpret_reshape branch from d0f7aeb to dfca7db Compare September 15, 2020 10:54

timholy mentioned this pull request Sep 22, 2020

Reuse Base.ReshapeArray/ReinterpretArray JuliaGPU/GPUArrays.jl#322

Merged

timholy force-pushed the teh/reinterpret_reshape branch from dfca7db to 797944c Compare September 29, 2020 12:39

timholy added 2 commits September 29, 2020 09:05

Quasi-linear indexing for reshaped reinterpretarray

69a82b5

timholy force-pushed the teh/reinterpret_reshape branch from 797944c to 7143444 Compare September 29, 2020 14:07

timholy merged commit 5c51ad2 into master Sep 29, 2020

timholy deleted the teh/reinterpret_reshape branch September 29, 2020 19:25

mcabbott mentioned this pull request Sep 30, 2020

Add reinterpret(reshape, T, a) JuliaLang/Compat.jl#722

Merged

This was referenced Sep 30, 2020

Precompile deadlock JuliaLang/Pkg.jl#2057

Closed

Various fixes for reinterpret(reshape, T, a) #37858

Merged

timholy mentioned this pull request Jan 19, 2021

Preserve input type for unaliascopy(::ReinterpretArray) #39316

Merged

simeonschaub added a commit to simeonschaub/Adapt.jl that referenced this pull request Jan 22, 2021

fix adapt for reinterpret(reshape, ...)

567712d

This was added in JuliaLang/julia#37559 and is currently not handled correctly by Adapt.

simeonschaub mentioned this pull request Jan 22, 2021

fix adapt for reinterpret(reshape, ...) JuliaGPU/Adapt.jl#39

Merged

kimikage mentioned this pull request Jan 24, 2021

reinterpret(reshape, T, A) is (still) slow on Windows #39382

Closed

timholy mentioned this pull request Sep 14, 2021

ntoh / bswap are 10x slower when operating in-place on reinterpret array #42227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the performance of `reinterpretarray` with simultaneous reshaping #37559

Fix the performance of `reinterpretarray` with simultaneous reshaping #37559

timholy commented Sep 13, 2020

timholy commented Sep 13, 2020

timholy Sep 15, 2020

timholy commented Sep 25, 2020

timholy commented Sep 29, 2020

timholy commented Sep 29, 2020 •

edited

Loading

Fix the performance of reinterpretarray with simultaneous reshaping #37559

Fix the performance of reinterpretarray with simultaneous reshaping #37559

Conversation

timholy commented Sep 13, 2020

timholy commented Sep 13, 2020

timholy Sep 15, 2020

Choose a reason for hiding this comment

timholy commented Sep 25, 2020

timholy commented Sep 29, 2020

timholy commented Sep 29, 2020 • edited Loading

Fix the performance of `reinterpretarray` with simultaneous reshaping #37559

Fix the performance of `reinterpretarray` with simultaneous reshaping #37559

timholy commented Sep 29, 2020 •

edited

Loading