Handling complex arrays #60

hofmannmartin · 2020-01-13T12:49:20Z

For optimization of an algorithm operating on complex data I wanted to use SIMD. However, trying to naively handle complex arrays using reinterpret failed. E.g.

using SIMD

vcomplex = Vector{ComplexF32}(undef, 10)
vreal = reinterpret(Float32,vcomplex)

xs = vload(Vec{4,Float32}, vreal, 1)

throws an error

MethodError: no method matching vload(::Type{Vec{4,Float32}}, ::Base.ReinterpretArray{Float32,1,Complex{Float32},Array{Complex{Float32},1}}, ::Int64)
Closest candidates are:
  vload(::Type{Vec{N,T}}, !Matched::Union{DenseArray{T,1}, SubArray{T,1,P,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where P}, ::Integer) where {N, T} at /home/moeddel/.julia/packages/SIMD/Am38N/src/SIMD.jl:1405
  vload(::Type{Vec{N,T}}, !Matched::Union{DenseArray{T,1}, SubArray{T,1,P,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where P}, ::Integer, !Matched::Val{Aligned}) where {N, T, Aligned} at /home/moeddel/.julia/packages/SIMD/Am38N/src/SIMD.jl:1405
  vload(::Type{Vec{N,T}}, !Matched::Union{DenseArray{T,1}, SubArray{T,1,P,I,true} where I<:Union{Tuple{Vararg{Real,N} where N}, Tuple{AbstractUnitRange,Vararg{Any,N} where N}} where P}, ::Integer, !Matched::Val{Aligned}, !Matched::Val{Nontemporal}) where {N, T, Aligned, Nontemporal} at /home/moeddel/.julia/packages/SIMD/Am38N/src/SIMD.jl:1405
  ...

Am I missing something, or is this feature not yet supported?

The text was updated successfully, but these errors were encountered:

eschnett · 2020-01-13T15:12:53Z

Complex numbers are not supported. They should, of course.

There are two possible ways to implement SIMD algorithms for complex numbers:

Handle real and imaginary parts separately. This works well with SIMD operations, but requires disentangling real and imaginary parts when loading values, and combining them when storing.
Handle complex numbers as a unit. This simplifies loading and storing, and works fine when adding or subtracting, but complex multiplication (or other nonlinear operations) require swizzling real and imaginary parts.

I know that certain architectures (Blue Gene/Q, old Intel CPUs with 128-bit SIMD lanes) support the second approach. I think that modern CPUs prefer the first approach, but I am not sure.

Do you know which approach would be best, or do you have pointers to literature?

hofmannmartin · 2020-01-13T16:25:10Z

I prefer the second approach, since it is also the one used by openblas routines such as https://github.com/xianyi/OpenBLAS/blob/develop/kernel/x86_64/caxpy_microk_haswell-2.c. Here for example numbers are permuted (shufflevector) inside the registers to exchange real and imaginary part.

eschnett · 2020-01-13T16:42:03Z

Looking at OpenBLAS is a good idea. Let's go with this approach.

Implementing addition etc. should be straightforward (no shuffling required). Complex multiplication requires shuffling.

hofmannmartin · 2020-01-13T20:00:46Z

and i² can be taken into account by a multiplication with a constant vector with alternating signs (e.g. Vec{4,Float32}((-1,1,-1,1)) because it only occurs in the real part.

hofmannmartin · 2020-01-13T20:12:09Z

conj too could be implemented by multiplication with the aforementioned vector.

louisponet · 2020-07-23T16:19:55Z

Will this require a specific complex Vec with double the data underlying? How does one acquire an NTuple{N,VecElement{ComplexF64}} from llvmcall?

I looked a bit through base in for example simdloops over imaginary things, there it seems to be loading elementptrs, put it in a 2x wide double vec and applies the rest of the math from inside the loop.

BioTurboNick · 2020-08-20T13:15:28Z

Related to the original error, what's the reason that SIMD isn't able to load from a ReinterpretArray?

I've been trying to bang my head against this for a while and keep getting blocked. It seems like this should work?

a = Vector{UInt8}(undef, 16)
b = reinterpret(UInt32, a)
c = vload(Vec{4, UInt32}, b, 1)

hofmannmartin · 2020-08-20T14:12:49Z

As far as I can tell there is simply no method for vload, which supports ReinterpretArray. Only DenseArray, SubArray and Tuple are supported and ReinterpretArray is no subtype of either of these. However, I have not looked into the code, so I have no idea how hard it is to make this addition.

BioTurboNick · 2020-08-20T14:13:43Z

Seems like all I had to do was change these lines to get ReinterpretArrays to work:

#ContiguousArray{T,N} = Union{DenseArray{T,N}, ContiguousSubArray{T,N}}
ContiguousArray{T,N} = Union{DenseArray{T,N}, ContiguousSubArray{T,N}, Base.ReinterpretArray{T,1,T2,A} where {A <: Union{DenseArray{T2,N}, ContiguousSubArray{T2,N}}} where {T2}}

#FastContiguousArray{T,N} = Union{DenseArray{T,N}, Base.FastContiguousSubArray{T,N}}
FastContiguousArray{T,N} = Union{DenseArray{T,N}, Base.FastContiguousSubArray{T,N}, Base.ReinterpretArray{T,1,T2,A} where {A <: Union{DenseArray{T2,N}, Base.FastContiguousSubArray{T2,N}}} where {T2}}

Is there something I'm missing?

Edit: PR #72 for this

hofmannmartin · 2020-08-20T14:23:35Z

Were you able to run my initial code above with this?

BioTurboNick · 2020-08-20T14:27:35Z

@hofmannmartin Yes, no errors. Is this what you expect?

vcomplex = Vector{ComplexF32}(undef, 10)
#=
10-element Array{Complex{Float32},1}:
   4.98157f-25 + 0.0f0im
  5.368139f-26 + 0.0f0im
    2.28642f-9 + 0.0f0im
  2.2862459f-9 + 0.0f0im
 3.8254035f-25 + 0.0f0im
  5.088271f-25 + 0.0f0im
 1.3259766f-25 + 0.0f0im
 1.0201263f-25 + 0.0f0im
 1.3262132f-25 + 0.0f0im
 1.3262034f-25 + 0.0f0im
=#

vreal = reinterpret(Float32,vcomplex)
#=
20-element reinterpret(Float32, ::Array{Complex{Float32},1}):
 4.98157f-25
 0.0
 5.368139f-26 
 0.0
 2.28642f-9   
 0.0
 2.2862459f-9 
 0.0
 3.8254035f-25
 0.0
 5.088271f-25 
 0.0
 1.3259766f-25
 0.0
 1.0201263f-25
 0.0
 1.3262132f-25
 0.0
 1.3262034f-25
 0.0
=#

xs = vload(Vec{4,Float32}, vreal, 1)
# <4 x Float32>[4.98157f-25, 0.0, 5.368139f-26, 0.0]

hofmannmartin · 2020-08-20T14:31:50Z

Alternating real and imaginary values are indeed what I expected. Might worth writing a small unit test?

willow-ahrens · 2021-02-16T18:54:46Z

I would also like to see complex vectors supported. I won't have time for a few weeks, but if I were to file a PR to add support for complex vectors, would anyone be interested in reviewing?

eschnett · 2021-02-17T22:52:54Z

I would be happy to review. Feel free to demonstrate an incomplete work-in-progress version if you are looking for feedback.

The challenging part will be complex number multiplication; this will require shuffling the vectors.

A different internal representation would be to store real and imaginary parts in separate SIMD vectors, i.e. to have Complex{Vec{4,Float32}}. This will make arithmetic operations simple, but puts the onus on loading and storing values, requiring the shuffling then.

KristofferC · 2021-02-18T07:47:09Z

The fact that there are two storages possible and depending on your use case, either one might be better makes it a bit unclear to me if SIMD.jl should really pick one. If anything, I personally think having them as separate vector makes a lot of sense because that would make it much more easy to translate scalar code to vector code.

hofmannmartin · 2021-02-18T08:18:03Z

Having them separated is the easy part. There is a section in LoopVectorization, which discusses this approach. More interesting to handle, is the native storage order like it is done in OpenBLAS.

I also have an example of a hand written kernel for this function. However, it would be great if this would work out of the box without having to tailor the kernel.

willow-ahrens · 2021-02-18T16:40:05Z

All the number types currently supported by SIMD.jl directly translate to native vector instructions. Because the complex vectors can be implemented (almost or entirely?) using existing SIMD.jl methods, it seems the main point of adding complex vector support to SIMD.jl is to use the Vec abstraction to automate the process of manually vectorizing complex code. As is pointed out in that LoopVectorization link, there are several similar "composite" number types which might be vectorized via a similar process, such as doubledouble, raising similar implementation decisions regarding data layout, etc.

Would it make sense to treat the Complex case as an example and test case for a minimal composite type vector interface? We might use a different base type ("VecComplex" or "ComplexVec") which supports the ability to choose between implementations.

eschnett · 2021-02-18T17:25:50Z

It seems both approaches would be useful. Whoever does the work should decide which to explore, with the understanding that SIMD might need to support both approaches in the future, so this road shouldn't be blocked.

I suggest the following:

Vec{4,Complex{Float64}} will store internally the equivalent of a Vec{8,Float64}, alternating between real and imaginary parts. Most vector operations should work out of the box. Complex number multiplication will need to shuffle vector lanes, as done in @hofmannmartin's code.
We can, in the future, use a new type CompositeVec{N,...} (better names to be suggested) to automatically split types into components.

photor · 2022-11-18T13:26:26Z

It seems both approaches would be useful. Whoever does the work should decide which to explore, with the understanding that SIMD might need to support both approaches in the future, so this road shouldn't be blocked.

I suggest the following:
* `Vec{4,Complex{Float64}}` will store internally the equivalent of a `Vec{8,Float64}`, alternating between real and imaginary parts. Most vector operations should work out of the box. Complex number multiplication will need to shuffle vector lanes, as done in @hofmannmartin's code.

* We can, in the future, use a new type `CompositeVec{N,...}` (better names to be suggested) to automatically split types into components.

Any progress?

willow-ahrens · 2022-11-18T13:47:46Z

I never found time to do this. Perhaps someone else does?

dannys4 mentioned this issue Jul 22, 2021

Requesting Support for fmaddsub #88

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling complex arrays #60

Handling complex arrays #60

hofmannmartin commented Jan 13, 2020

eschnett commented Jan 13, 2020

hofmannmartin commented Jan 13, 2020

eschnett commented Jan 13, 2020

hofmannmartin commented Jan 13, 2020

hofmannmartin commented Jan 13, 2020

louisponet commented Jul 23, 2020

BioTurboNick commented Aug 20, 2020

hofmannmartin commented Aug 20, 2020

BioTurboNick commented Aug 20, 2020 •

edited

Loading

hofmannmartin commented Aug 20, 2020

BioTurboNick commented Aug 20, 2020

hofmannmartin commented Aug 20, 2020

willow-ahrens commented Feb 16, 2021

eschnett commented Feb 17, 2021

KristofferC commented Feb 18, 2021

hofmannmartin commented Feb 18, 2021

willow-ahrens commented Feb 18, 2021

eschnett commented Feb 18, 2021

photor commented Nov 18, 2022

willow-ahrens commented Nov 18, 2022 •

edited

Loading

Handling complex arrays #60

Handling complex arrays #60

Comments

hofmannmartin commented Jan 13, 2020

eschnett commented Jan 13, 2020

hofmannmartin commented Jan 13, 2020

eschnett commented Jan 13, 2020

hofmannmartin commented Jan 13, 2020

hofmannmartin commented Jan 13, 2020

louisponet commented Jul 23, 2020

BioTurboNick commented Aug 20, 2020

hofmannmartin commented Aug 20, 2020

BioTurboNick commented Aug 20, 2020 • edited Loading

hofmannmartin commented Aug 20, 2020

BioTurboNick commented Aug 20, 2020

hofmannmartin commented Aug 20, 2020

willow-ahrens commented Feb 16, 2021

eschnett commented Feb 17, 2021

KristofferC commented Feb 18, 2021

hofmannmartin commented Feb 18, 2021

willow-ahrens commented Feb 18, 2021

eschnett commented Feb 18, 2021

photor commented Nov 18, 2022

willow-ahrens commented Nov 18, 2022 • edited Loading

BioTurboNick commented Aug 20, 2020 •

edited

Loading

willow-ahrens commented Nov 18, 2022 •

edited

Loading