Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CPUTuple <: AbstractCPU as new device type #131

Merged
merged 2 commits into from
Mar 5, 2021
Merged

Conversation

chriselrod
Copy link
Collaborator

Related to #130

I'm open to suggestions on a better name, e.g., if someone were to define:

struct My3Vector{T} <: AbstractVector{T}
    x::T
    y::T
    z::T
end

it should also be a CPUTuple. So maybe something like CPUStruct is a better name? Or CPUStructMemory?
Or CPULLVMArray? Most homogenous tuples lower to LLVM Arrays.

The idea is to represent something that should really be CPUPointer, except for the niggling detail that Julia semantics don't allow us to get a pointer to it, while C(++) would allow us to just use the & operator to get the address.

Julia's semantics actually more closely match LLVM there, and what Clang does when you & is basically the same as calling Ref in Julia and then using that Ptr, except maybe Clang optimizes away the copy more consistently for some reason? That'll take some more exploration, I just recall in my tests some time ago Julia often failed to optimize them away, but I didn't actually check whether a similar example in C succeeds.

So the CPUTuple type means to say that this is what we should do.

That's to distinguish it from other AbstractArray representations that aren't actually backed by memory underlying their loads, like Fill arrays, ranges, etc.

@codecov
Copy link

codecov bot commented Mar 5, 2021

Codecov Report

Merging #131 (b5de99d) into master (7ce4be8) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #131      +/-   ##
==========================================
+ Coverage   85.45%   85.47%   +0.02%     
==========================================
  Files           9        9              
  Lines        1409     1411       +2     
==========================================
+ Hits         1204     1206       +2     
  Misses        205      205              
Impacted Files Coverage Δ
src/ArrayInterface.jl 85.18% <100.00%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ce4be8...b5de99d. Read the comment docs.

@Tokazama
Copy link
Member

Tokazama commented Mar 5, 2021

So I know that you can't go from Tuple to a pointer, but if we had something like this...

buffer = Ref{NTuple{N,T}}()
a = SArray{S,T,N,L}(buffer[])

...does the dereferencing always reallocate that entire buffer to create a?

@chriselrod
Copy link
Collaborator Author

...does the dereferencing always reallocate that entire buffer to create a?

No, sometimes LLVM is in a good mood.

@Tokazama
Copy link
Member

Tokazama commented Mar 5, 2021

So is the issue that we want to be able to do something like...

dereference(::CPUStruct, buffer) = buffer[] # get rid of reference and mutability
dereference(::CPUPointer, buffer) = buffer  # keep reference and mutability, other functions wrap in array struct

@chriselrod
Copy link
Collaborator Author

So is the issue that we want to be able to do something like...

dereference(::CPUStruct, buffer) = buffer[] # get rid of reference and mutability
dereference(::CPUPointer, buffer) = buffer  # keep reference and mutability, other functions wrap in array struct

I'll redefine VectorizationBase.memory_reference to return a tuple containing "ptr" and memory, to something like

@inline memory_reference(A::BitArray) = (Base.unsafe_convert(Ptr{Bit}, A.chunks), A.chunks)
@inline memory_reference(A::AbstractArray) = memory_reference(device(A), A)
@inline memory_reference(::CPUPointer, A) = (pointer(A), preserve_buffer(A))
@inline function memory_reference(::CPUTuple, A::AbstractArray{T}) where {T}
    ref = Ref(A)
    Base.unsafe_convert(Ptr{T}, ref), ref
end

Then the suggested use would be something like:

# either
ptra, presa = stridedpointer_and_buffer(A);
ptrb, presb = stridedpointer_and_buffer(B);
# or
(ptra,ptrb), (presa,presb) = groupedstridedpointer((A,B), (#= description of axis similarity =#));
GC.@preserve presa presb begin
    # use ptra and ptrb
end

This would allow SArrays and NTuples to work, but would rely on LLVM being generous for good performance (unlike MArrays, which should perform well reliably).

However, the @gc_preserve macro would probably just pass through CPUTuples, the idea being it's intended to let arrays be stack allocated, which should already be the case for CPUTuple objects.

I'll make a breaking change to groupedstridedpointer to also return the buffer, figuring that no one is actually using it yet (except for a soon-to-be-mainlined LoopVectorization branch).

Copy link
Member

@Tokazama Tokazama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chriselrod chriselrod merged commit f05b8bd into master Mar 5, 2021
@chriselrod chriselrod deleted the cputuple branch April 5, 2021 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants