Support for JLD2 #1833

denizyuret · 2023-03-25T07:49:42Z

Here is what I do to be able to save/load CuArrays with JLD2 files:

using CUDA
import JLD2, FileIO
struct JLD2CuArray{T,N}; array::Array{T,N}; end                                                                                              
JLD2.writeas(::Type{CuArray{T,N,D}}) where {T,N,D} = JLD2CuArray{T,N}                                                                        
JLD2.wconvert(::Type{JLD2CuArray{T,N}}, x::CuArray{T,N,D}) where {T,N,D} = JLD2CuArray(Array(x))                                             
JLD2.rconvert(::Type{CuArray{T,N,D}}, x::JLD2CuArray{T,N}) where {T,N,D} = CuArray(x.array)

This used to work with CuArray{T,N} but no longer works with CuArray{T,N,D}. Here is the error I get:

julia> a = CUDA.rand(3,5)
julia> FileIO.save("foo.jld2", "a", a)
julia> d = FileIO.load("foo.jld2")
Dict{String, Any} with 1 entry:Error showing value of type Dict{String, Any}:                                                                
ERROR: CUDA error: invalid argument (code 1, ERROR_INVALID_VALUE)                                                                            
Stacktrace:                                                                                                                                  
  [1] throw_api_error(res::CUDA.cudaError_enum)                                                                                              
    @ CUDA /userfiles/dyuret/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:89                                                              
  [2] macro expansion                                                                                                                        
    @ /userfiles/dyuret/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:97 [inlined]                                                         
  [3] cuMemcpyDtoHAsync_v2                                                                                                                   
    @ /userfiles/dyuret/.julia/packages/CUDA/BbliS/lib/utils/call.jl:26 [inlined]                                                            
  [4] #unsafe_copyto!#8                                                                                                                      
    @ /userfiles/dyuret/.julia/packages/CUDA/BbliS/lib/cudadrv/memory.jl:397 [inlined]                                                       
  [5] (::CUDA.var"#189#190"{Float32, Matrix{Float32}, Int64, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Int64, Int64})()                    
    @ CUDA /userfiles/dyuret/.julia/packages/CUDA/BbliS/src/array.jl:413                                                                     
  [6] #context!#63                                                                                                                           
    @ /userfiles/dyuret/.julia/packages/CUDA/BbliS/lib/cudadrv/state.jl:164 [inlined]                                                        
  [7] context!                                                                                                                               
    @ /userfiles/dyuret/.julia/packages/CUDA/BbliS/lib/cudadrv/state.jl:159 [inlined]                                                        
  [8] unsafe_copyto!(dest::Matrix{Float32}, doffs::Int64, src::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, soffs::Int64, n::Int64)           
    @ CUDA /userfiles/dyuret/.julia/packages/CUDA/BbliS/src/array.jl:406                                                                     
  [9] copyto!                                                                                                                                
    @ /userfiles/dyuret/.julia/packages/CUDA/BbliS/src/array.jl:360 [inlined]                                                                
 [10] copyto!                                                                                                                                
    @ /userfiles/dyuret/.julia/packages/CUDA/BbliS/src/array.jl:364 [inlined]                                                                
 [11] copyto_axcheck!(dest::Matrix{Float32}, src::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})                                                
    @ Base ./abstractarray.jl:1127                                                                                                           
 [12] Array                                                                                                                                  
    @ ./array.jl:626 [inlined]                                                                                                               
 [13] Array                                                                                                                                  
    @ ./boot.jl:483 [inlined]                                                                                                                
 [14] convert                                                                                                                                
    @ ./array.jl:617 [inlined]                                                                                                               
 [15] adapt_storage                                                                                                                          
    @ /userfiles/dyuret/.julia/packages/GPUArrays/XR4WO/src/host/abstractarray.jl:23 [inlined]                                               
 [16] adapt_structure                                                                                                                        
    @ /userfiles/dyuret/.julia/packages/Adapt/xviDc/src/Adapt.jl:57 [inlined]                                                                
 [17] adapt                                                                                                                                  
    @ /userfiles/dyuret/.julia/packages/Adapt/xviDc/src/Adapt.jl:40 [inlined]                                                                
 [18] _show_nonempty                                                                                                                         
    @ /userfiles/dyuret/.julia/packages/GPUArrays/XR4WO/src/host/abstractarray.jl:30 [inlined]                                               
 [19] show(io::IOContext{IOBuffer}, X::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})                                                           
    @ Base ./arrayshow.jl:489                                                                                                                
 [20] sprint(f::Function, args::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}; context::IOContext{Base.TTY}, sizehint::Int64)                   
    @ Base ./strings/io.jl:112                                                                                                               
 [21] show(io::IOContext{Base.TTY}, #unused#::MIME{Symbol("text/plain")}, t::Dict{String, Any})                                              
    @ Base ./show.jl:112

When I compare the original array with the loaded version they seem similar except for the refcount:

julia> dump(a)                                                                                                                               
CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}                                                                                                   
  storage: CUDA.ArrayStorage{CUDA.Mem.DeviceBuffer}                                                                                          
    buffer: CUDA.Mem.DeviceBuffer                                                                                                            
      ctx: CuContext                                                                                                                         
        handle: Ptr{Nothing} @0x0000000002bbbe80                                                                                             
        valid: Bool true                                                                                                                     
      ptr: CuPtr{Nothing} CuPtr{Nothing}(0x0000000200e00000)                                                                                 
      bytesize: Int64 60                                                                                                                     
      async: Bool true                                                                                                                       
    refcount: Base.Threads.Atomic{Int64}                                                                                                     
      value: Int64 1                                                                                                                         
  maxsize: Int64 60                                                                                                                          
  offset: Int64 0                                                                                                                            
  dims: Tuple{Int64, Int64}                                                                                                                  
    1: Int64 3                                                                                                                               
    2: Int64 5                                                                                                                               
julia> dump(d["a"])                                                                                                                          
CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}                                                                                                   
  storage: CUDA.ArrayStorage{CUDA.Mem.DeviceBuffer}                                                                                          
    buffer: CUDA.Mem.DeviceBuffer                                                                                                            
      ctx: CuContext                                                                                                                         
        handle: Ptr{Nothing} @0x0000000002bbbe80                                                                                             
        valid: Bool true                                                                                                                     
      ptr: CuPtr{Nothing} CuPtr{Nothing}(0x0000000200e00200)                                                                                 
      bytesize: Int64 60                                                                                                                     
      async: Bool true                                                                                                                       
    refcount: Base.Threads.Atomic{Int64}                                                                                                     
      value: Int64 0                                                                                                                         
  maxsize: Int64 60                                                                                                                          
  offset: Int64 0                                                                                                                            
  dims: Tuple{Int64, Int64}                                                                                                                  
    1: Int64 3                                                                                                                               
    2: Int64 5

Finally, if I assign the value read to a global variable in rconvert it works without any errors:

julia> JLD2.rconvert(::Type{CuArray{T,N,D}}, x::JLD2CuArray{T,N}) where {T,N,D} = (y=CuArray(x.array); global dbg=y; y)
julia> d = FileIO.load("foo.jld2")
julia> d["a"] # works with no problems

The text was updated successfully, but these errors were encountered:

maleadt · 2023-03-25T10:32:22Z

JLD2 has never really been supported. I guess the fact it worked was just sheer luck? In any case, I'm not familiar with JLD2, so I'll defer to anybody who is to take a look 🙂

JonasIsensee · 2023-03-30T12:37:18Z

Hi @denizyuret,

from the perspective of JLD2 your code looks absolutely ok.
What versions are you on? I can't reproduce the problem.

denizyuret · 2023-03-30T14:14:04Z

[052768ef] CUDA v3.13.1 # (haven't upgraded to 4.x yet, but if it solves the JLD2 issue I will)
[5789e2e9] FileIO v1.16.0
[033835bb] JLD2 v0.4.31
julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 470.57.2, for CUDA 11.4
CUDA driver 11.7

Libraries: 
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+470.57.2
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

denizyuret · 2023-03-30T14:28:27Z

Alas my hope was shortlived :( I get the same error with CUDA v4.1.2

JonasIsensee · 2023-03-30T15:47:29Z

I still can't reproduce your error. (I tried julia 1.8.5 and 1.9.0-rc1 with CUDA 3.13.1 and JLD2 v0.4.31)

denizyuret · 2023-03-30T16:26:06Z

Can you send your CUDA.versioninfo so I can see what the difference may be? (library/driver version, gpu type etc could be a factor?)

JonasIsensee · 2023-03-30T16:28:29Z

julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 515.86.1, for CUDA 11.7
CUDA driver 11.7

Libraries: 
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+515.86.1
  Downloaded artifact: CUDNN
- CUDNN: 8.30.2 (for CUDA 11.5.0)
  Downloaded artifact: CUTENSOR
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

noyongkyoon · 2023-04-04T09:52:00Z

I tried JLD2.writeas(), JLD2.wconvert(), and JLD2.rconvert() as you suggested. Now I get the following error message:

AssertionError: refcount != 0

Stacktrace:
 [1] _derived_array
   @ ~/.julia/packages/CUDA/BbliS/src/array.jl:729 [inlined]
 [2] reshape(a::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, dims::Tuple{Int64})
   @ CUDA ~/.julia/packages/CUDA/BbliS/src/array.jl:723
 [3] reshape
   @ ./reshapedarray.jl:117 [inlined]
 [4] vec(a::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer})
   @ Base ./abstractarraymath.jl:41
 [5] (::RNN)(x::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}; batchSizes::Nothing)
   @ Knet.Ops20 ~/.julia/packages/Knet/YIFWC/src/ops20/rnn.jl:332
 [6] (::RNN)(x::CuArray{Float32, 3, CUDA.Mem.DeviceBuffer})
   @ Knet.Ops20 ~/.julia/packages/Knet/YIFWC/src/ops20/rnn.jl:329
 [7] (::Chain)(x::Matrix{UInt16})
   @ Main ./In[5]:6
 [8] tag(tagger::Chain, s::String)
   @ Main ./In[29]:6
 [9] top-level scope
   @ In[30]:1

What is "refcount"? What purpose does it serve? How can one alter its value, if altering it is necessary?
You do say above: "they seem similar except for the refcount." Can you elaborate on it?

JonasIsensee · 2023-04-04T10:05:00Z

Finally, if I assign the value read to a global variable in rconvert it works without any errors:
julia> JLD2.rconvert(::Type{CuArray{T,N,D}}, x::JLD2CuArray{T,N}) where {T,N,D} = (y=CuArray(x.array); global dbg=y; y)
julia> d = FileIO.load("foo.jld2")
julia> d["a"] # works with no problems

This here (and also the refcount ) makes me think that this is a problem with the memory management when creating the CuArray. JLD2 allocates the underlying array and passes it to the CuArray(data) constructor and then ceases to keep track of it. (leading to refcount = 0).
This would explain, why the global scope thing could fix it.
@denizyuret Could you try a few functions of this type?

function f()
     data = rand(10,10)
     CuArray(data)
end

denizyuret · 2023-04-05T07:56:05Z

@denizyuret Could you try a few functions of this type?

The f() function you suggested works without problems. refcount of the resulting array is 1.

JLD2 allocates the underlying array and passes it to the CuArray(data) constructor and then ceases to keep track of it. (leading to refcount = 0).

CuArray copies the contents of data (stored in RAM) to the GPU memory, and once the GPU array is constructed I don't think it cares about what happens to the RAM array. But I am not sure what refcount is for and how it is set, so I may be talking nonsense. If I change the value of refcount manually to 0, things don't break for example.

@maleadt any idea how refcount=0 may appear and whether it may be the source of our problems?

maleadt · 2023-04-05T10:32:34Z

But I am not sure what refcount is for and how it is set, so I may be talking nonsense.

The refcount field is to keep track of the underlying buffer, so that multiple CuArrays can share the same memory (e.g., when you take a view, or reinterpret an array, or reshape it).

refcount=0 may happen when you're serializing a freed array.

JonasIsensee · 2023-04-05T11:08:27Z

The refcount field is to keep track of the underlying buffer, so that multiple CuArrays can share the same memory (e.g., when you take a view, or reinterpret an array, or reshape it).

refcount=0 may happen when you're serializing a freed array.

Thank you for this info.
It is a bit odd, though. The problem here is most certainly during deserialization. (Otherwise the workarounds above couldn't work)

maleadt · 2023-04-05T11:21:06Z

Hmm, I was misunderstanding how JLD serializes object. If we're really just calling Array(...) and CuArray(...) (i.e., not serializing CuArray objects directly), I fail to see how we would ever run into refcount=0. FWIW, I also can't reproduce this issue.

JonasIsensee · 2023-04-05T11:36:04Z

Yeah, that's the curious bit. Let me summarize it quickly:

default: JLD2 attempts to serialize structs by going through its fields. This fails for CuArray since they don't actually contain the data
custom serialization: This is what @denizyuret attempted here:

using CUDA
import JLD2, FileIO
struct JLD2CuArray{T,N}; array::Array{T,N}; end                                                                                              
JLD2.writeas(::Type{CuArray{T,N,D}}) where {T,N,D} = JLD2CuArray{T,N}                                                                        
JLD2.wconvert(::Type{JLD2CuArray{T,N}}, x::CuArray{T,N,D}) where {T,N,D} = JLD2CuArray(Array(x))                                             
JLD2.rconvert(::Type{CuArray{T,N,D}}, x::JLD2CuArray{T,N}) where {T,N,D} = CuArray(x.array)

We define a struct JLD2CuArray that contains data that JLD2 can safely store, along with convert methods for both directions. (rconvert and wconvert - Base.convert also works but that is risky with invalidations...)

When you give JLD2 any object, it always asks JLD2.writeas what type to store it as (default writeas(::T) where T = T)
and it will then call the conversion methods as necessary.

Therefore, with this code, we store the data in Array form AND the full CuArray{T,N,D} type signature (not shown) to call the correct rconvert method upon loading.

maleadt · 2023-04-05T11:49:58Z

The fact that the deserialized object contains a different buffer pointer indicates that the rconvert function has run. This seems to point to a GC-related issue, but if JLD2 is just storing the deserialized object in a regular dictionary the finalizer shouldn't ever run.

@denizyuret since only you seem to be able to reproduce this, I'd add some logging to the CuArray finalizer that decrements the refcount, to see when and from where it gets run (e.g. by adding sprint(Base.show_backtrace, backtrace()) or so to your log messages).

denizyuret added the bug Something isn't working label Mar 25, 2023

maleadt changed the title ~~Saving and loading CuArray to JLD2 no longer works~~ Support for JLD2 Mar 25, 2023

maleadt added enhancement New feature or request help wanted Extra attention is needed and removed bug Something isn't working labels Mar 25, 2023

denizyuret mentioned this issue Mar 25, 2023

CuArray support JuliaIO/JLD2.jl#464

Closed

JonasIsensee mentioned this issue Mar 30, 2023

CUDA.versioninfo() triggers download of lazy artifacts #1844

Closed

denizyuret mentioned this issue Apr 4, 2023

load/save problem with CuArrays in tutorial/60.rnn.ipynb denizyuret/Knet.jl#668

Open

maleadt added bug Something isn't working and removed enhancement New feature or request labels Apr 5, 2023

maleadt closed this as completed Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for JLD2 #1833

Support for JLD2 #1833

denizyuret commented Mar 25, 2023

maleadt commented Mar 25, 2023 •

edited

Loading

JonasIsensee commented Mar 30, 2023 •

edited

Loading

denizyuret commented Mar 30, 2023

denizyuret commented Mar 30, 2023

JonasIsensee commented Mar 30, 2023

denizyuret commented Mar 30, 2023

JonasIsensee commented Mar 30, 2023

noyongkyoon commented Apr 4, 2023 •

edited by maleadt

Loading

JonasIsensee commented Apr 4, 2023

denizyuret commented Apr 5, 2023

maleadt commented Apr 5, 2023

JonasIsensee commented Apr 5, 2023

maleadt commented Apr 5, 2023

JonasIsensee commented Apr 5, 2023 •

edited

Loading

maleadt commented Apr 5, 2023

Support for JLD2 #1833

Support for JLD2 #1833

Comments

denizyuret commented Mar 25, 2023

maleadt commented Mar 25, 2023 • edited Loading

JonasIsensee commented Mar 30, 2023 • edited Loading

denizyuret commented Mar 30, 2023

denizyuret commented Mar 30, 2023

JonasIsensee commented Mar 30, 2023

denizyuret commented Mar 30, 2023

JonasIsensee commented Mar 30, 2023

noyongkyoon commented Apr 4, 2023 • edited by maleadt Loading

JonasIsensee commented Apr 4, 2023

denizyuret commented Apr 5, 2023

maleadt commented Apr 5, 2023

JonasIsensee commented Apr 5, 2023

maleadt commented Apr 5, 2023

JonasIsensee commented Apr 5, 2023 • edited Loading

maleadt commented Apr 5, 2023

maleadt commented Mar 25, 2023 •

edited

Loading

JonasIsensee commented Mar 30, 2023 •

edited

Loading

noyongkyoon commented Apr 4, 2023 •

edited by maleadt

Loading

JonasIsensee commented Apr 5, 2023 •

edited

Loading