Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for unified arrays. #1023

Merged
merged 4 commits into from
Jul 29, 2021
Merged

Add support for unified arrays. #1023

merged 4 commits into from
Jul 29, 2021

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Jun 30, 2021

No description provided.

@maleadt maleadt added enhancement New feature or request cuda array Stuff about CuArray. labels Jun 30, 2021
@maleadt maleadt mentioned this pull request Jun 30, 2021
@maleadt maleadt force-pushed the tb/unified branch 3 times, most recently from 4feaea3 to 144d0b4 Compare June 30, 2021 14:48
@codecov
Copy link

codecov bot commented Jun 30, 2021

Codecov Report

Merging #1023 (748e49d) into master (a8b1f48) will increase coverage by 0.05%.
The diff coverage is 86.25%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1023      +/-   ##
==========================================
+ Coverage   79.98%   80.04%   +0.05%     
==========================================
  Files         118      118              
  Lines        7640     7657      +17     
==========================================
+ Hits         6111     6129      +18     
+ Misses       1529     1528       -1     
Impacted Files Coverage Δ
lib/cusparse/array.jl 65.96% <50.00%> (ø)
src/pool.jl 77.13% <82.22%> (+1.07%) ⬆️
src/array.jl 92.47% <93.75%> (+0.47%) ⬆️
src/broadcast.jl 85.71% <100.00%> (ø)
lib/cusparse/generic.jl 98.26% <0.00%> (+0.03%) ⬆️
lib/cusparse/conversions.jl 63.68% <0.00%> (+0.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a8b1f48...748e49d. Read the comment docs.

@maleadt maleadt marked this pull request as draft July 1, 2021 07:11
@maleadt
Copy link
Member Author

maleadt commented Jul 1, 2021

Continuing from #946, it is a fair point that calls like similar or adapt that take an array type, won't be able to choose whether the array has to reside in unified memory or not. That'd be a good argument to put that property in the type, but adding a typevar (or worse, a separate type) is so invasive that I really don't want to. It's simple enough to add, but creates tons of additional ambiguities, abstract types where we used to have concrete ones, signatures that now need to be adapted to <:CuArray when the type was being used as a type parameter, etc...

diff --git a/src/array.jl b/src/array.jl
index cbdd4f2a..85f21fa8 100644
--- a/src/array.jl
+++ b/src/array.jl
@@ -3,54 +3,71 @@ export CuArray, CuVector, CuMatrix, CuVecOrMat, cu
 
 ## array storage
 
-# array storage is shared by arrays that refer to the same data, while keeping track of
-# the number of outstanding references
+# storage is shared by arrays that refer to the same data, while keeping track of the number
+# of outstanding references and other storage-specific properties (like the owning context).
+#
+# NOTE: if we're ever able to fully reuse array wrappers (like SubArray), the refcount can
+# go and we can just store the underlying buffer directly in each CuArray.
 
-struct ArrayStorage
-  buffer::Union{Mem.DeviceBuffer, Mem.UnifiedBuffer}
+# the refcount also encodes the state of the array:
+# < 0: unmanaged
+# = 0: freed
+# > 0: referenced
 
+abstract type AbstractStorage end
+
+struct DeviceStorage <: AbstractStorage
+  buffer::Mem.DeviceBuffer
   ctx::CuContext
+  refcount::Threads.Atomic{Int}
+
+  DeviceStorage(buffer::Mem.DeviceBuffer, ctx::CuContext, refcount::Int) =
+    new(buffer, ctx, Threads.Atomic{Int}(refcount))
+end
 
-  # the refcount also encodes the state of the array:
-  # < 0: unmanaged
-  # = 0: freed
-  # > 0: referenced
+struct UnifiedStorage <: AbstractStorage
+  buffer::Mem.UnifiedBuffer
+  ctx::CuContext
   refcount::Threads.Atomic{Int}
+
+  UnifiedStorage(buffer::Mem.DeviceBuffer, ctx::CuContext, refcount::Int) =
+    new(buffer, ctx, Threads.Atomic{Int}(refcount))
 end
 
-ArrayStorage(buf, ctx, state::Int) = ArrayStorage(buf, ctx, Threads.Atomic{Int}(state))
+ArrayStorage(buf::Mem.DeviceBuffer, args...) = DeviceStorage(buf, args...)
+ArrayStorage(buf::Mem.UnifiedBuffer, args...) = UnifiedStorage(buf, args...)
 
 
 ## array type
 
-mutable struct CuArray{T,N} <: AbstractGPUArray{T,N}
-  storage::Union{Nothing,ArrayStorage}
+mutable struct CuArray{T,N,S} <: AbstractGPUArray{T,N}
+  storage::Union{Nothing,S}
 
   maxsize::Int  # maximum data size; excluding any selector bytes
   offset::Int
 
   dims::Dims{N}
+end
 
-  function CuArray{T,N}(::UndefInitializer, dims::Dims{N}; unified::Bool=false) where {T,N}
-    Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline")
-    maxsize = prod(dims) * sizeof(T)
-    bufsize = if Base.isbitsunion(T)
-      # type tag array past the data
-      maxsize + prod(dims)
-    else
-      maxsize
-    end
-    buf = alloc(bufsize; unified)
-    storage = ArrayStorage(buf, context(), 1)
-    obj = new{T,N}(storage, maxsize, 0, dims)
-    finalizer(unsafe_finalize!, obj)
-  end
+function CuArray{T,N}(storage::S, dims::Dims{N};
+                      maxsize::Int=prod(dims) * sizeof(T), offset::Int=0) where {T,N,S}
+  Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline")
+  return CuArray{T,N,S}(storage, maxsize, offset, dims)
+end
 
-  function CuArray{T,N}(storage::ArrayStorage, dims::Dims{N};
-                        maxsize::Int=prod(dims) * sizeof(T), offset::Int=0) where {T,N}
-    Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline")
-    return new{T,N}(storage, maxsize, offset, dims,)
+function CuArray{T,N}(::UndefInitializer, dims::Dims{N}; unified::Bool=false) where {T,N}
+  Base.allocatedinline(T) || error("CuArray only supports element types that are stored inline")
+  maxsize = prod(dims) * sizeof(T)
+  bufsize = if Base.isbitsunion(T)
+    # type tag array past the data
+    maxsize + prod(dims)
+  else
+    maxsize
   end
+  buf = alloc(bufsize; unified)
+  storage = ArrayStorage(buf, context(), 1)
+  obj = CuArray{T,N,typeof(storage)}(storage, maxsize, 0, dims)
+  finalizer(unsafe_finalize!, obj)
 end

@akashkgarg
Copy link

I think this makes a lot of sense to me. I'll continue my multi-gpu work based off of these changes and see if I run into any major drawbacks. Would be make sense to add things like the UnifiedCuArray adaptor as part of this PR?

@maleadt
Copy link
Member Author

maleadt commented Jul 23, 2021

Wasn't happy with the unified keyword, so gave it a go encoding as a type parameter instead. Doesn't seem to break too many things (but that's because recently we had another type parameter, so code has been adapted then).

@maleadt maleadt marked this pull request as ready for review July 23, 2021 11:45
@maleadt
Copy link
Member Author

maleadt commented Jul 23, 2021

@akashkgarg Please have a look if this works for your uses cases. Both cu(; unified=true) and similar etc should now work as expected.

@maleadt
Copy link
Member Author

maleadt commented Jul 23, 2021

@vchuravy @christophernhill If needed, this PR can be extended to differentiate between regular device buffers, and async device buffers (by introducing a Mem.AsyncDeviceBuffer, which makes sense since they need to be managed with different APIs anyway). That would make it possible to, e.g., have both kinds of buffers in a single application without having to switch globally.

@maleadt maleadt merged commit db639be into master Jul 29, 2021
@maleadt maleadt deleted the tb/unified branch July 29, 2021 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda array Stuff about CuArray. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants