Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conv is not working for Complex when using CUDA #1655

Open
foldfelis opened this issue Jul 8, 2021 · 5 comments
Open

Conv is not working for Complex when using CUDA #1655

foldfelis opened this issue Jul 8, 2021 · 5 comments

Comments

@foldfelis
Copy link
Contributor

foldfelis commented Jul 8, 2021

using Flux
using CUDA

CUDA.allowscalar(false)

# T = Float32
T = ComplexF32

m = Chain(
    Conv((3, ), 1=>2, pad=1),
) |> gpu

# 10 points 1 channel with batchsize=2
x = reshape(rand(T, 10, 2), (10, 1, 2)) |> gpu

m(x)

The code mentioned above will work if change the T fo Float32. If run on CPU, both ComplexF32 and Float32 work.

And the model will also work if allow scalar.

The error message when T = ComplexF32 using CUDA:


ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] assertscalar(op::String)
    @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:53
  [3] getindex(::CuArray{ComplexF32, 5}, ::Int64, ::Int64, ::Int64, ::Int64, ::Vararg{Int64, N} where N)
    @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:86
  [4] conv_direct!(y::CuArray{ComplexF32, 5}, x::CuArray{ComplexF32, 5}, w::CuArray{Float32, 5}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false}; alpha::ComplexF32, beta::Bool)
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_direct.jl:91
  [5] conv_direct!
    @ ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_direct.jl:51 [inlined]
  [6] conv!(y::CuArray{ComplexF32, 5}, in1::CuArray{ComplexF32, 5}, in2::CuArray{Float32, 5}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:208
  [7] conv!(y::CuArray{ComplexF32, 5}, in1::CuArray{ComplexF32, 5}, in2::CuArray{Float32, 5}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:206
  [8] conv!(y::CuArray{ComplexF32, 3}, x::CuArray{ComplexF32, 3}, w::CuArray{Float32, 3}, cdims::DenseConvDims{1, (3,), 1, 2, (1,), (1, 1), (1,), false}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:148
  [9] conv!
    @ ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:148 [inlined]
 [10] #conv#87
    @ ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:91 [inlined]
 [11] conv(x::CuArray{ComplexF32, 3}, w::CuArray{Float32, 3}, cdims::DenseConvDims{1, (3,), 1, 2, (1,), (1, 1), (1,), false})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:89
 [12] (::Conv{1, 2, typeof(identity), CuArray{Float32, 3}, CuArray{Float32, 1}})(x::CuArray{ComplexF32, 3})
    @ Flux ~/.julia/packages/Flux/0c9kI/src/layers/conv.jl:157
 [13] applychain
    @ ~/.julia/packages/Flux/0c9kI/src/layers/basic.jl:36 [inlined]
 [14] (::Chain{Tuple{Conv{1, 2, typeof(identity), CuArray{Float32, 3}, CuArray{Float32, 1}}}})(x::CuArray{ComplexF32, 3})
    @ Flux ~/.julia/packages/Flux/0c9kI/src/layers/basic.jl:38
 [15] top-level scope
    @ ~/Documents/GitHub/SqState.jl/src/dummy_model.jl:16
 [16] include(fname::String)
    @ Base.MainInclude ./client.jl:444
 [17] top-level scope
    @ REPL[1]:1
 [18] top-level scope
    @ ~/.julia/packages/CUDA/fRSUT/src/initialization.jl:52
in expression starting at /home/admin/Documents/GitHub/SqState.jl/src/dummy_model.jl:16
@DhairyaLGandhi
Copy link
Member

Seems like the weights are real still. What happens if we convert those to complex, CUDA should be able to work with that.

@foldfelis
Copy link
Contributor Author

Hi @DhairyaLGandhi ,

I have tried

c_glorot_uniform(dims...) = Flux.glorot_uniform(dims...) + Flux.glorot_uniform(dims...) * im

m = Chain(
    Conv((3, ), 1=>2, pad=1, init=c_glorot_uniform),
) |> gpu

And I got the same error

ERROR: LoadError: TaskFailedException
Stacktrace:
  [1] wait
    @ ./task.jl:322 [inlined]
  [2] threading_run(func::Function)
    @ Base.Threads ./threadingconstructs.jl:34
  [3] macro expansion
    @ ./threadingconstructs.jl:93 [inlined]
  [4] conv_im2col!(y::CuArray{ComplexF32, 5}, x::CuArray{ComplexF32, 5}, w::CuArray{ComplexF32, 5}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false}; col::CuArray{ComplexF32, 3}, alpha::ComplexF32, beta::ComplexF32)
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_im2col.jl:49
  [5] conv_im2col!
    @ ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_im2col.jl:30 [inlined]
  [6] #conv!#149
    @ ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:191 [inlined]
  [7] conv!(out::CuArray{ComplexF32, 5}, in1::CuArray{ComplexF32, 5}, in2::CuArray{ComplexF32, 5}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:191
  [8] conv!(y::CuArray{ComplexF32, 3}, x::CuArray{ComplexF32, 3}, w::CuArray{ComplexF32, 3}, cdims::DenseConvDims{1, (3,), 1, 2, (1,), (1, 1), (1,), false}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:148
  [9] conv!
    @ ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:148 [inlined]
 [10] #conv#87
    @ ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:91 [inlined]
 [11] conv(x::CuArray{ComplexF32, 3}, w::CuArray{ComplexF32, 3}, cdims::DenseConvDims{1, (3,), 1, 2, (1,), (1, 1), (1,), false})
    @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/conv.jl:89
 [12] (::Conv{1, 2, typeof(identity), CuArray{ComplexF32, 3}, CuArray{ComplexF32, 1}})(x::CuArray{ComplexF32, 3})
    @ Flux ~/.julia/packages/Flux/0c9kI/src/layers/conv.jl:157
 [13] applychain
    @ ~/.julia/packages/Flux/0c9kI/src/layers/basic.jl:36 [inlined]
 [14] (::Chain{Tuple{Conv{1, 2, typeof(identity), CuArray{ComplexF32, 3}, CuArray{ComplexF32, 1}}}})(x::CuArray{ComplexF32, 3})
    @ Flux ~/.julia/packages/Flux/0c9kI/src/layers/basic.jl:38
 [15] top-level scope
    @ ~/Documents/GitHub/SqState.jl/script/test_cuda.jl:18
 [16] include(fname::String)
    @ Base.MainInclude ./client.jl:444
 [17] top-level scope
    @ REPL[1]:1

    nested task error: Scalar indexing is disallowed.
    Invocation of getindex resulted in scalar indexing of a GPU array.
    This is typically caused by calling an iterating implementation of a method.
    Such implementations *do not* execute on the GPU, but very slowly on the CPU,
    and therefore are only permitted from the REPL for prototyping purposes.
    If you did intend to index this array, annotate the caller with @allowscalar.
    Stacktrace:
     [1] error(s::String)
       @ Base ./error.jl:33
     [2] assertscalar(op::String)
       @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:53
     [3] getindex(::CuArray{ComplexF32, 4}, ::Int64, ::Int64, ::Int64, ::Int64)
       @ GPUArrays ~/.julia/packages/GPUArrays/8dzSJ/src/host/indexing.jl:86
     [4] im2col!(col::CuArray{ComplexF32, 2}, x::CuArray{ComplexF32, 4}, cdims::DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false})
       @ NNlib ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_im2col.jl:230
     [5] macro expansion
       @ ~/.julia/packages/NNlib/zo8Ev/src/impl/conv_im2col.jl:53 [inlined]
     [6] (::NNlib.var"#727#threadsfor_fun#366"{CuArray{ComplexF32, 3}, ComplexF32, ComplexF32, CuArray{ComplexF32, 5}, CuArray{ComplexF32, 5}, CuArray{ComplexF32, 5}, DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false}, Int64, Int64, Int64, UnitRange{Int64}})(onethread::Bool)
       @ NNlib ./threadingconstructs.jl:81
     [7] (::NNlib.var"#727#threadsfor_fun#366"{CuArray{ComplexF32, 3}, ComplexF32, ComplexF32, CuArray{ComplexF32, 5}, CuArray{ComplexF32, 5}, CuArray{ComplexF32, 5}, DenseConvDims{3, (3, 1, 1), 1, 2, (1, 1, 1), (1, 1, 0, 0, 0, 0), (1, 1, 1), false}, Int64, Int64, Int64, UnitRange{Int64}})()
       @ NNlib ./threadingconstructs.jl:48
in expression starting at /home/admin/Documents/GitHub/SqState.jl/script/test_cuda.jl:18

@ToucheSir
Copy link
Member

https://github.com/FluxML/NNlib.jl/blob/v0.7.33/src/impl/conv_im2col.jl#L230 is the culprit, so unless we get a CUDA-compatible (conv_)im2col in NNlib this will not work.

@DhairyaLGandhi
Copy link
Member

Seems like if we can use a sufficiently general rule it would. Does conv_direct solve this?

@ToucheSir
Copy link
Member

conv_direct is even worse because it makes pervasive use of scalar indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants