`Conv` is not working for Complex when using CUDA #1655

foldfelis · 2021-07-08T23:05:33Z

using Flux
using CUDA

CUDA.allowscalar(false)

# T = Float32
T = ComplexF32

m = Chain(
    Conv((3, ), 1=>2, pad=1),
) |> gpu

# 10 points 1 channel with batchsize=2
x = reshape(rand(T, 10, 2), (10, 1, 2)) |> gpu

m(x)

The code mentioned above will work if change the T fo Float32. If run on CPU, both ComplexF32 and Float32 work.

And the model will also work if allow scalar.

The error message when T = ComplexF32 using CUDA:


ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] assertscalar(op::String)
    @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:53
  [3] getindex(::CuArray{ComplexF32, 5}, ::Int64, ::Int64, ::Int64, ::Int64, ::Vararg{Int64, N} where N)
    @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:86
  [4] conv_direct!(y::CuArray{ComplexF32, 5}, x::CuArray{ComplexF32, 5}, w::CuArray{Float32, 5}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false}; alpha::ComplexF32, beta::Bool)
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_direct.jl:91
  [5] conv_direct!
    @ ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_direct.jl:51 [inlined]
  [6] conv!(y::CuArray{ComplexF32, 5}, in1::CuArray{ComplexF32, 5}, in2::CuArray{Float32, 5}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:208
  [7] conv!(y::CuArray{ComplexF32, 5}, in1::CuArray{ComplexF32, 5}, in2::CuArray{Float32, 5}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:206
  [8] conv!(y::CuArray{ComplexF32, 3}, x::CuArray{ComplexF32, 3}, w::CuArray{Float32, 3}, cdims::DenseConvDims{1, (3,), 1, 2, (1,), (1, 1), (1,), false}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:148
  [9] conv!
    @ ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:148 [inlined]
 [10] #conv#87
    @ ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:91 [inlined]
 [11] conv(x::CuArray{ComplexF32, 3}, w::CuArray{Float32, 3}, cdims::DenseConvDims{1, (3,), 1, 2, (1,), (1, 1), (1,), false})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:89
 [12] (::Conv{1, 2, typeof(identity), CuArray{Float32, 3}, CuArray{Float32, 1}})(x::CuArray{ComplexF32, 3})
    @ Flux ~/.julia/packages/Flux/0c9kI/src/layers/conv.jl:157
 [13] applychain
    @ ~/.julia/packages/Flux/0c9kI/src/layers/basic.jl:36 [inlined]
 [14] (::Chain{Tuple{Conv{1, 2, typeof(identity), CuArray{Float32, 3}, CuArray{Float32, 1}}}})(x::CuArray{ComplexF32, 3})
    @ Flux ~/.julia/packages/Flux/0c9kI/src/layers/basic.jl:38
 [15] top-level scope
    @ ~/Documents/GitHub/SqState.jl/src/dummy_model.jl:16
 [16] include(fname::String)
    @ Base.MainInclude ./client.jl:444
 [17] top-level scope
    @ REPL[1]:1
 [18] top-level scope
    @ ~/.julia/packages/CUDA/fRSUT/src/initialization.jl:52
in expression starting at /home/admin/Documents/GitHub/SqState.jl/src/dummy_model.jl:16

The text was updated successfully, but these errors were encountered:

DhairyaLGandhi · 2021-07-14T17:43:25Z

Seems like the weights are real still. What happens if we convert those to complex, CUDA should be able to work with that.

foldfelis · 2021-07-15T03:36:14Z

Hi @DhairyaLGandhi ,

I have tried

c_glorot_uniform(dims...) = Flux.glorot_uniform(dims...) + Flux.glorot_uniform(dims...) * im

m = Chain(
    Conv((3, ), 1=>2, pad=1, init=c_glorot_uniform),
) |> gpu

And I got the same error

ERROR: LoadError: TaskFailedException
Stacktrace:
  [1] wait
    @ ./task.jl:322 [inlined]
  [2] threading_run(func::Function)
    @ Base.Threads ./threadingconstructs.jl:34
  [3] macro expansion
    @ ./threadingconstructs.jl:93 [inlined]
  [4] conv_im2col!(y::CuArray{ComplexF32, 5}, x::CuArray{ComplexF32, 5}, w::CuArray{ComplexF32, 5}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false}; col::CuArray{ComplexF32, 3}, alpha::ComplexF32, beta::ComplexF32)
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_im2col.jl:49
  [5] conv_im2col!
    @ ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_im2col.jl:30 [inlined]
  [6] #conv!#149
    @ ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:191 [inlined]
  [7] conv!(out::CuArray{ComplexF32, 5}, in1::CuArray{ComplexF32, 5}, in2::CuArray{ComplexF32, 5}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:191
  [8] conv!(y::CuArray{ComplexF32, 3}, x::CuArray{ComplexF32, 3}, w::CuArray{ComplexF32, 3}, cdims::DenseConvDims{1, (3,), 1, 2, (1,), (1, 1), (1,), false}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:148
  [9] conv!
    @ ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:148 [inlined]
 [10] #conv#87
    @ ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:91 [inlined]
 [11] conv(x::CuArray{ComplexF32, 3}, w::CuArray{ComplexF32, 3}, cdims::DenseConvDims{1, (3,), 1, 2, (1,), (1, 1), (1,), false})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:89
 [12] (::Conv{1, 2, typeof(identity), CuArray{ComplexF32, 3}, CuArray{ComplexF32, 1}})(x::CuArray{ComplexF32, 3})
    @ Flux ~/.julia/packages/Flux/0c9kI/src/layers/conv.jl:157
 [13] applychain
    @ ~/.julia/packages/Flux/0c9kI/src/layers/basic.jl:36 [inlined]
 [14] (::Chain{Tuple{Conv{1, 2, typeof(identity), CuArray{ComplexF32, 3}, CuArray{ComplexF32, 1}}}})(x::CuArray{ComplexF32, 3})
    @ Flux ~/.julia/packages/Flux/0c9kI/src/layers/basic.jl:38
 [15] top-level scope
    @ ~/Documents/GitHub/SqState.jl/script/test_cuda.jl:18
 [16] include(fname::String)
    @ Base.MainInclude ./client.jl:444
 [17] top-level scope
    @ REPL[1]:1

    nested task error: Scalar indexing is disallowed.
    Invocation of getindex resulted in scalar indexing of a GPU array.
    This is typically caused by calling an iterating implementation of a method.
    Such implementations *do not* execute on the GPU, but very slowly on the CPU,
    and therefore are only permitted from the REPL for prototyping purposes.
    If you did intend to index this array, annotate the caller with @allowscalar.
    Stacktrace:
     [1] error(s::String)
       @ Base ./error.jl:33
     [2] assertscalar(op::String)
       @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:53
     [3] getindex(::CuArray{ComplexF32, 4}, ::Int64, ::Int64, ::Int64, ::Int64)
       @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:86
     [4] im2col!(col::CuArray{ComplexF32, 2}, x::CuArray{ComplexF32, 4}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false})
       @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_im2col.jl:230
     [5] macro expansion
       @ ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_im2col.jl:53 [inlined]
     [6] (::NNlib.var"#727#threadsfor_fun#366"{CuArray{ComplexF32, 3}, ComplexF32, ComplexF32, CuArray{ComplexF32, 5}, CuArray{ComplexF32, 5}, CuArray{ComplexF32, 5}, DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false}, Int64, Int64, Int64, UnitRange{Int64}})(onethread::Bool)
       @ NNlib ./threadingconstructs.jl:81
     [7] (::NNlib.var"#727#threadsfor_fun#366"{CuArray{ComplexF32, 3}, ComplexF32, ComplexF32, CuArray{ComplexF32, 5}, CuArray{ComplexF32, 5}, CuArray{ComplexF32, 5}, DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false}, Int64, Int64, Int64, UnitRange{Int64}})()
       @ NNlib ./threadingconstructs.jl:48
in expression starting at /home/admin/Documents/GitHub/SqState.jl/script/test_cuda.jl:18

ToucheSir · 2022-01-16T01:31:24Z

https://github.com/FluxML/NNlib.jl/blob/v0.7.33/src/impl/conv_im2col.jl#L230 is the culprit, so unless we get a CUDA-compatible (conv_)im2col in NNlib this will not work.

DhairyaLGandhi · 2022-01-16T09:14:20Z

Seems like if we can use a sufficiently general rule it would. Does conv_direct solve this?

ToucheSir · 2022-01-16T16:31:11Z

conv_direct is even worse because it makes pervasive use of scalar indexing.

foldfelis mentioned this issue Feb 19, 2022

Fix gradient of convolution for complex values FluxML/NNlib.jl#389

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Conv` is not working for Complex when using CUDA #1655

`Conv` is not working for Complex when using CUDA #1655

foldfelis commented Jul 8, 2021 •

edited

Loading

DhairyaLGandhi commented Jul 14, 2021

foldfelis commented Jul 15, 2021

ToucheSir commented Jan 16, 2022

DhairyaLGandhi commented Jan 16, 2022

ToucheSir commented Jan 16, 2022

Conv is not working for Complex when using CUDA #1655

Conv is not working for Complex when using CUDA #1655

Comments

foldfelis commented Jul 8, 2021 • edited Loading

DhairyaLGandhi commented Jul 14, 2021

foldfelis commented Jul 15, 2021

ToucheSir commented Jan 16, 2022

DhairyaLGandhi commented Jan 16, 2022

ToucheSir commented Jan 16, 2022

`Conv` is not working for Complex when using CUDA #1655

`Conv` is not working for Complex when using CUDA #1655

foldfelis commented Jul 8, 2021 •

edited

Loading