-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for unified arrays. #1023
Conversation
4feaea3
to
144d0b4
Compare
Codecov Report
@@ Coverage Diff @@
## master #1023 +/- ##
==========================================
+ Coverage 79.98% 80.04% +0.05%
==========================================
Files 118 118
Lines 7640 7657 +17
==========================================
+ Hits 6111 6129 +18
+ Misses 1529 1528 -1
Continue to review full report at Codecov.
|
Continuing from #946, it is a fair point that calls like diff --git a/src/array.jl b/src/array.jl
index cbdd4f2a..85f21fa8 100644
--- a/src/array.jl
+++ b/src/array.jl
@@ -3,54 +3,71 @@ export CuArray, CuVector, CuMatrix, CuVecOrMat, cu
## array storage
-# array storage is shared by arrays that refer to the same data, while keeping track of
-# the number of outstanding references
+# storage is shared by arrays that refer to the same data, while keeping track of the number
+# of outstanding references and other storage-specific properties (like the owning context).
+#
+# NOTE: if we're ever able to fully reuse array wrappers (like SubArray), the refcount can
+# go and we can just store the underlying buffer directly in each CuArray.
-struct ArrayStorage
- buffer::Union{Mem.DeviceBuffer, Mem.UnifiedBuffer}
+# the refcount also encodes the state of the array:
+# < 0: unmanaged
+# = 0: freed
+# > 0: referenced
+abstract type AbstractStorage end
+
+struct DeviceStorage <: AbstractStorage
+ buffer::Mem.DeviceBuffer
ctx::CuContext
+ refcount::Threads.Atomic{Int}
+
+ DeviceStorage(buffer::Mem.DeviceBuffer, ctx::CuContext, refcount::Int) =
+ new(buffer, ctx, Threads.Atomic{Int}(refcount))
+end
- # the refcount also encodes the state of the array:
- # < 0: unmanaged
- # = 0: freed
- # > 0: referenced
+struct UnifiedStorage <: AbstractStorage
+ buffer::Mem.UnifiedBuffer
+ ctx::CuContext
refcount::Threads.Atomic{Int}
+
+ UnifiedStorage(buffer::Mem.DeviceBuffer, ctx::CuContext, refcount::Int) =
+ new(buffer, ctx, Threads.Atomic{Int}(refcount))
end
-ArrayStorage(buf, ctx, state::Int) = ArrayStorage(buf, ctx, Threads.Atomic{Int}(state))
+ArrayStorage(buf::Mem.DeviceBuffer, args...) = DeviceStorage(buf, args...)
+ArrayStorage(buf::Mem.UnifiedBuffer, args...) = UnifiedStorage(buf, args...)
## array type
-mutable struct CuArray{T,N} <: AbstractGPUArray{T,N}
- storage::Union{Nothing,ArrayStorage}
+mutable struct CuArray{T,N,S} <: AbstractGPUArray{T,N}
+ storage::Union{Nothing,S}
maxsize::Int # maximum data size; excluding any selector bytes
offset::Int
dims::Dims{N}
+end
- function CuArray{T,N}(::UndefInitializer, dims::Dims{N}; unified::Bool=false) where {T,N}
- Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline")
- maxsize = prod(dims) * sizeof(T)
- bufsize = if Base.isbitsunion(T)
- # type tag array past the data
- maxsize + prod(dims)
- else
- maxsize
- end
- buf = alloc(bufsize; unified)
- storage = ArrayStorage(buf, context(), 1)
- obj = new{T,N}(storage, maxsize, 0, dims)
- finalizer(unsafe_finalize!, obj)
- end
+function CuArray{T,N}(storage::S, dims::Dims{N};
+ maxsize::Int=prod(dims) * sizeof(T), offset::Int=0) where {T,N,S}
+ Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline")
+ return CuArray{T,N,S}(storage, maxsize, offset, dims)
+end
- function CuArray{T,N}(storage::ArrayStorage, dims::Dims{N};
- maxsize::Int=prod(dims) * sizeof(T), offset::Int=0) where {T,N}
- Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline")
- return new{T,N}(storage, maxsize, offset, dims,)
+function CuArray{T,N}(::UndefInitializer, dims::Dims{N}; unified::Bool=false) where {T,N}
+ Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline")
+ maxsize = prod(dims) * sizeof(T)
+ bufsize = if Base.isbitsunion(T)
+ # type tag array past the data
+ maxsize + prod(dims)
+ else
+ maxsize
end
+ buf = alloc(bufsize; unified)
+ storage = ArrayStorage(buf, context(), 1)
+ obj = CuArray{T,N,typeof(storage)}(storage, maxsize, 0, dims)
+ finalizer(unsafe_finalize!, obj)
end |
I think this makes a lot of sense to me. I'll continue my multi-gpu work based off of these changes and see if I run into any major drawbacks. Would be make sense to add things like the UnifiedCuArray adaptor as part of this PR? |
Wasn't happy with the |
@akashkgarg Please have a look if this works for your uses cases. Both |
@vchuravy @christophernhill If needed, this PR can be extended to differentiate between regular device buffers, and async device buffers (by introducing a |
No description provided.