Add CPUTuple <: AbstractCPU as new device type #131

chriselrod · 2021-03-05T02:36:05Z

Related to #130

I'm open to suggestions on a better name, e.g., if someone were to define:

struct My3Vector{T} <: AbstractVector{T}
    x::T
    y::T
    z::T
end

it should also be a CPUTuple. So maybe something like CPUStruct is a better name? Or CPUStructMemory?
Or CPULLVMArray? Most homogenous tuples lower to LLVM Arrays.

The idea is to represent something that should really be CPUPointer, except for the niggling detail that Julia semantics don't allow us to get a pointer to it, while C(++) would allow us to just use the & operator to get the address.

Julia's semantics actually more closely match LLVM there, and what Clang does when you & is basically the same as calling Ref in Julia and then using that Ptr, except maybe Clang optimizes away the copy more consistently for some reason? That'll take some more exploration, I just recall in my tests some time ago Julia often failed to optimize them away, but I didn't actually check whether a similar example in C succeeds.

So the CPUTuple type means to say that this is what we should do.

That's to distinguish it from other AbstractArray representations that aren't actually backed by memory underlying their loads, like Fill arrays, ranges, etc.

codecov · 2021-03-05T02:57:46Z

Codecov Report

Merging #131 (b5de99d) into master (7ce4be8) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #131      +/-   ##
==========================================
+ Coverage   85.45%   85.47%   +0.02%     
==========================================
  Files           9        9              
  Lines        1409     1411       +2     
==========================================
+ Hits         1204     1206       +2     
  Misses        205      205

Impacted Files	Coverage Δ
src/ArrayInterface.jl	`85.18% <100.00%> (+0.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ce4be8...b5de99d. Read the comment docs.

Tokazama · 2021-03-05T11:06:52Z

So I know that you can't go from Tuple to a pointer, but if we had something like this...

buffer = Ref{NTuple{N,T}}()
a = SArray{S,T,N,L}(buffer[])

...does the dereferencing always reallocate that entire buffer to create a?

chriselrod · 2021-03-05T11:46:52Z

...does the dereferencing always reallocate that entire buffer to create a?

No, sometimes LLVM is in a good mood.

Tokazama · 2021-03-05T13:50:02Z

So is the issue that we want to be able to do something like...

dereference(::CPUStruct, buffer) = buffer[] # get rid of reference and mutability
dereference(::CPUPointer, buffer) = buffer  # keep reference and mutability, other functions wrap in array struct

chriselrod · 2021-03-05T16:56:59Z

So is the issue that we want to be able to do something like...

dereference(::CPUStruct, buffer) = buffer[] # get rid of reference and mutability
dereference(::CPUPointer, buffer) = buffer  # keep reference and mutability, other functions wrap in array struct

I'll redefine VectorizationBase.memory_reference to return a tuple containing "ptr" and memory, to something like

@inline memory_reference(A::BitArray) = (Base.unsafe_convert(Ptr{Bit}, A.chunks), A.chunks)
@inline memory_reference(A::AbstractArray) = memory_reference(device(A), A)
@inline memory_reference(::CPUPointer, A) = (pointer(A), preserve_buffer(A))
@inline function memory_reference(::CPUTuple, A::AbstractArray{T}) where {T}
    ref = Ref(A)
    Base.unsafe_convert(Ptr{T}, ref), ref
end

Then the suggested use would be something like:

# either
ptra, presa = stridedpointer_and_buffer(A);
ptrb, presb = stridedpointer_and_buffer(B);
# or
(ptra,ptrb), (presa,presb) = groupedstridedpointer((A,B), (#= description of axis similarity =#));
GC.@preserve presa presb begin
    # use ptra and ptrb
end

This would allow SArrays and NTuples to work, but would rely on LLVM being generous for good performance (unlike MArrays, which should perform well reliably).

However, the @gc_preserve macro would probably just pass through CPUTuples, the idea being it's intended to let arrays be stack allocated, which should already be the case for CPUTuple objects.

I'll make a breaking change to groupedstridedpointer to also return the buffer, figuring that no one is actually using it yet (except for a soon-to-be-mainlined LoopVectorization branch).

Tokazama

LGTM

Add CPUTuple <: AbstractCPU as new device type

470c7f8

chriselrod requested a review from Tokazama March 5, 2021 02:36

chriselrod mentioned this pull request Mar 5, 2021

Array Constructors - similar #130

Open

Update tests for CPUTuple.

b5de99d

Tokazama reviewed Mar 5, 2021

View reviewed changes

chriselrod merged commit f05b8bd into master Mar 5, 2021

chriselrod deleted the cputuple branch April 5, 2021 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CPUTuple <: AbstractCPU as new device type #131

Add CPUTuple <: AbstractCPU as new device type #131

chriselrod commented Mar 5, 2021

codecov bot commented Mar 5, 2021 •

edited

Loading

Tokazama commented Mar 5, 2021

chriselrod commented Mar 5, 2021

Tokazama commented Mar 5, 2021

chriselrod commented Mar 5, 2021

Tokazama left a comment

Add CPUTuple <: AbstractCPU as new device type #131

Add CPUTuple <: AbstractCPU as new device type #131

Conversation

chriselrod commented Mar 5, 2021

codecov bot commented Mar 5, 2021 • edited Loading

Codecov Report

Tokazama commented Mar 5, 2021

chriselrod commented Mar 5, 2021

Tokazama commented Mar 5, 2021

chriselrod commented Mar 5, 2021

Tokazama left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 5, 2021 •

edited

Loading