Errors on small array inputs #52

GiggleLiu · 2020-11-04T02:49:05Z

The BLAS.gemmEx! function errors on array size <128. I want to experiment the functionality on small array size like 16x16.

Error message

ERROR: LoadError: ArgumentError: Grid dimensions should be non-null
Stacktrace:
 [1] launch(::CuFunction, ::CuDeviceArray{Float16,2,1}, ::CuDeviceArray{Float16,2,1}, ::CuDeviceArray{Float32,2,1}, ::CuDeviceArray{Float32,2,1}, ::GemmKernels.Transform.Elementwise{GemmKernels.BLAS.var"#1#3"{Int64,Int64}}, ::GemmKernels.Transform.Elementwise{GemmKernels.BLAS.var"#2#4"{Int64}}; blocks::Tuple{Int64,Int64}, threads::Int64, cooperative::Bool, shmem::Int64, stream::CuStream) at /home/leo/.julia/dev/CUDA/lib/cudadrv/execution.jl:57
 [2] #599 at /home/leo/.julia/dev/CUDA/lib/cudadrv/execution.jl:138 [inlined]
 [3] macro expansion at /home/leo/.julia/dev/CUDA/lib/cudadrv/execution.jl:97 [inlined]
 [4] convert_arguments at /home/leo/.julia/dev/CUDA/lib/cudadrv/execution.jl:79 [inlined]
 [5] #cudacall#598 at /home/leo/.julia/dev/CUDA/lib/cudadrv/execution.jl:137 [inlined]
 [6] #cudacall#790 at /home/leo/.julia/dev/CUDA/src/compiler/execution.jl:219 [inlined]
 [7] macro expansion at /home/leo/.julia/dev/CUDA/src/compiler/execution.jl:200 [inlined]
 [8] call(::CUDA.HostKernel{GemmKernels.Kernel.matmul_pipelined,Tuple{CuDeviceArray{Float16,2,1},CuDeviceArray{Float16,2,1},CuDeviceArray{Float32,2,1},CuDeviceArray{Float32,2,1},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{GemmKernels.BLAS.var"#1#3"{Int64,Int64}},GemmKernels.Transform.Elementwise{GemmKernels.BLAS.var"#2#4"{Int64}},GemmKernels.Epilogue.Default,Type{GemmKernels.Config{(M = 64, N = 64, K = 64),(M = 128, N = 128, K = 64),8,(M = 128, K = 2),(M = 8, K = 1),(K = 64, N = 4),(K = 8, N = 1),(M = 128, N = 1),(M = 4, N = 1),(M = 32, N = 64, K = 16),(M = 16, N = 16, K = 16),GemmKernels.Layout.AlignedColMajor{Float16},GemmKernels.Layout.AlignedColMajor{Float16},GemmKernels.Layout.AlignedColMajor{Float32},GemmKernels.Layout.AlignedColMajor{Float32},GemmKernels.Layout.Padded{GemmKernels.Layout.AlignedColMajor{Float16},8},GemmKernels.Layout.Padded{GemmKernels.Layout.AlignedColMajor{Float16},8},GemmKernels.Layout.AlignedColMajor{Float32},GemmKernels.Layout.AlignedColMajor{Float32},WMMAOp{16,16,16},true,true}}}}, ::CuDeviceArray{Float16,2,1}, ::CuDeviceArray{Float16,2,1}, ::CuDeviceArray{Float32,2,1}, ::CuDeviceArray{Float32,2,1}, ::GemmKernels.Transform.Elementwise{typeof(identity)}, ::GemmKernels.Transform.Elementwise{typeof(identity)}, ::GemmKernels.Transform.Elementwise{typeof(identity)}, ::GemmKernels.Transform.Elementwise{typeof(identity)}, ::GemmKernels.Transform.Elementwise{typeof(identity)}, ::GemmKernels.Transform.Elementwise{typeof(identity)}, ::GemmKernels.Transform.Elementwise{GemmKernels.BLAS.var"#1#3"{Int64,Int64}}, ::GemmKernels.Transform.Elementwise{GemmKernels.BLAS.var"#2#4"{Int64}}, ::GemmKernels.Epilogue.Default, ::Type{GemmKernels.Config{(M = 64, N = 64, K = 64),(M = 128, N = 128, K = 64),8,(M = 128, K = 2),(M = 8, K = 1),(K = 64, N = 4),(K = 8, N = 1),(M = 128, N = 1),(M = 4, N = 1),(M = 32, N = 64, K = 16),(M = 16, N = 16, K = 16),GemmKernels.Layout.AlignedColMajor{Float16},GemmKernels.Layout.AlignedColMajor{Float16},GemmKernels.Layout.AlignedColMajor{Float32},GemmKernels.Layout.AlignedColMajor{Float32},GemmKernels.Layout.Padded{GemmKernels.Layout.AlignedColMajor{Float16},8},GemmKernels.Layout.Padded{GemmKernels.Layout.AlignedColMajor{Float16},8},GemmKernels.Layout.AlignedColMajor{Float32},GemmKernels.Layout.AlignedColMajor{Float32},WMMAOp{16,16,16},true,true}}; call_kwargs::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:threads, :blocks, :shmem),Tuple{Int64,Tuple{Int64,Int64},Int64}}}) at /home/leo/.julia/dev/CUDA/src/compiler/execution.jl:171
 [9] (::CUDA.HostKernel{GemmKernels.Kernel.matmul_pipelined,Tuple{CuDeviceArray{Float16,2,1},CuDeviceArray{Float16,2,1},CuDeviceArray{Float32,2,1},CuDeviceArray{Float32,2,1},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{typeof(identity)},GemmKernels.Transform.Elementwise{GemmKernels.BLAS.var"#1#3"{Int64,Int64}},GemmKernels.Transform.Elementwise{GemmKernels.BLAS.var"#2#4"{Int64}},GemmKernels.Epilogue.Default,Type{GemmKernels.Config{(M = 64, N = 64, K = 64),(M = 128, N = 128, K = 64),8,(M = 128, K = 2),(M = 8, K = 1),(K = 64, N = 4),(K = 8, N = 1),(M = 128, N = 1),(M = 4, N = 1),(M = 32, N = 64, K = 16),(M = 16, N = 16, K = 16),GemmKernels.Layout.AlignedColMajor{Float16},GemmKernels.Layout.AlignedColMajor{Float16},GemmKernels.Layout.AlignedColMajor{Float32},GemmKernels.Layout.AlignedColMajor{Float32},GemmKernels.Layout.Padded{GemmKernels.Layout.AlignedColMajor{Float16},8},GemmKernels.Layout.Padded{GemmKernels.Layout.AlignedColMajor{Float16},8},GemmKernels.Layout.AlignedColMajor{Float32},GemmKernels.Layout.AlignedColMajor{Float32},WMMAOp{16,16,16},true,true}}}})(::CuDeviceArray{Float16,2,1}, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:threads, :blocks, :shmem),Tuple{Int64,Tuple{Int64,Int64},Int64}}}) at /home/leo/.julia/dev/CUDA/src/compiler/execution.jl:353
 [10] matmul(::CuArray{Float16,2}, ::CuArray{Float16,2}, ::CuArray{Float32,2}, ::CuArray{Float32,2}, ::Type{T} where T; transform_global_to_shared_a::GemmKernels.Transform.Elementwise{typeof(identity)}, transform_global_to_shared_b::GemmKernels.Transform.Elementwise{typeof(identity)}, transform_global_to_shared_c::GemmKernels.Transform.Elementwise{typeof(identity)}, transform_shared_to_global_d::GemmKernels.Transform.Elementwise{typeof(identity)}, transform_shared_to_regs_a::GemmKernels.Transform.Elementwise{typeof(identity)}, transform_shared_to_regs_b::GemmKernels.Transform.Elementwise{typeof(identity)}, transform_shared_to_regs_c::GemmKernels.Transform.Elementwise{GemmKernels.BLAS.var"#1#3"{Int64,Int64}}, transform_regs_to_shared_d::GemmKernels.Transform.Elementwise{GemmKernels.BLAS.var"#2#4"{Int64}}, epilogue::GemmKernels.Epilogue.Default, kernel::typeof(GemmKernels.Kernel.matmul_pipelined)) at /home/leo/.julia/dev/GemmKernels/src/launch.jl:26
 [11] gemmEx!(::Char, ::Char, ::Int64, ::CuArray{Float16,2}, ::CuArray{Float16,2}, ::Int64, ::CuArray{Float32,2}) at /home/leo/.julia/dev/GemmKernels/src/blas.jl:64
 [12] top-level scope at /home/leo/.julia/dev/GemmKernels/matmul.jl:42
 [13] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091
 [14] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ./essentials.jl:710
 [15] invokelatest(::Any, ::Any, ::Vararg{Any,N} where N) at ./essentials.jl:709
 [16] inlineeval(::Module, ::String, ::Int64, ::Int64, ::String; softscope::Bool) at /home/leo/.vscode/extensions/julialang.language-julia-1.0.8/scripts/packages/VSCodeServer/src/eval.jl:83
 [17] (::VSCodeServer.var"#43#45"{VSCodeServer.ReplRunCodeRequestParams,String,Int64,Int64,String,Module,Bool})() at /home/leo/.vscode/extensions/julialang.language-julia-1.0.8/scripts/packages/VSCodeServer/src/eval.jl:45
 [18] withpath(::VSCodeServer.var"#43#45"{VSCodeServer.ReplRunCodeRequestParams,String,Int64,Int64,String,Module,Bool}, ::String) at /home/leo/.vscode/extensions/julialang.language-julia-1.0.8/scripts/packages/VSCodeServer/src/repl.jl:118
 [19] (::VSCodeServer.var"#42#44"{VSCodeServer.ReplRunCodeRequestParams,String,Int64,Int64,String,Module,Bool,Bool})() at /home/leo/.vscode/extensions/julialang.language-julia-1.0.8/scripts/packages/VSCodeServer/src/eval.jl:43
 [20] hideprompt(::VSCodeServer.var"#42#44"{VSCodeServer.ReplRunCodeRequestParams,String,Int64,Int64,String,Module,Bool,Bool}) at /home/leo/.vscode/extensions/julialang.language-julia-1.0.8/scripts/packages/VSCodeServer/src/repl.jl:36
 [21] repl_runcode_request(::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint,Base.PipeEndpoint}, ::VSCodeServer.ReplRunCodeRequestParams) at /home/leo/.vscode/extensions/julialang.language-julia-1.0.8/scripts/packages/VSCodeServer/src/eval.jl:23
 [22] dispatch_msg(::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint,Base.PipeEndpoint}, ::VSCodeServer.JSONRPC.MsgDispatcher, ::Dict{String,Any}) at /home/leo/.vscode/extensions/julialang.language-julia-1.0.8/scripts/packages/JSONRPC/src/typed.jl:66
 [23] macro expansion at /home/leo/.vscode/extensions/julialang.language-julia-1.0.8/scripts/packages/VSCodeServer/src/VSCodeServer.jl:95 [inlined]
 [24] (::VSCodeServer.var"#61#63"{Bool,String})() at ./task.jl:356
in expression starting at /home/leo/.julia/dev/GemmKernels/matmul.jl:42

Also, the computed value is not correct if the array size is not an exponent of

The text was updated successfully, but these errors were encountered:

thomasfaingnaert · 2020-11-04T09:01:36Z

This is a limitation of the current implementation: only arrays whose size is a multiple of the threadblock size (e.g. (M = 128, N = 128, K = 64) for WMMA mixed-precision) are supported at the moment.
One way to support arbitrary matrix dimensions would be to predicate the loads from global memory to only access elements inside the bounds of the global matrix.

maleadt mentioned this issue Jun 27, 2023

Using GemmKernels.jl in CUDA.jl #108

Open

8 tasks

maleadt mentioned this issue Jul 5, 2023

Add layouts for accessing unaligned or non tile-sized global. #130

Merged

maleadt closed this as completed Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors on small array inputs #52

Errors on small array inputs #52

GiggleLiu commented Nov 4, 2020

thomasfaingnaert commented Nov 4, 2020

Errors on small array inputs #52

Errors on small array inputs #52

Comments

GiggleLiu commented Nov 4, 2020

thomasfaingnaert commented Nov 4, 2020