add depthwise_conv* overloads for CUDA #22

DhairyaLGandhi · 2021-07-17T04:48:01Z

No description provided.

CarloLucibello · 2021-07-17T05:56:20Z

src/cudnn/conv.jl

+function depthwise_conv!(y::DenseCuArray{T}, x::DenseCuArray{T}, w::DenseCuArray{T}, cdims::DepthwiseConvDims;
+               alpha = 1, beta = 0, algo = -1) where T <: CUDNNFloat
+    conv!(y, x, w, cims; alpha, beta, algo)
+end
+
+function ∇depthwise_conv_filter!(dw::DenseCuArray{T}, x::DenseCuArray{T}, dy::DenseCuArray{T},
+                       cdims::ConvDims; alpha = 1, beta = 0, algo = -1) where T <: CUDNNFloat
+  ∇conv_filter!(dw, x, dy, cdims; alpha, beta, algo)
+end
+
+function ∇depthwise_conv_data!(dx::DenseCuArray{T}, dy::DenseCuArray{T}, w::DenseCuArray{T},
+                     cdims::ConvDims; alpha = 1, beta = 0, algo = -1) where T <: CUDNNFloat
+    ∇conv_data!(dx, dy, w, cdims; alpha, beta, algo)
+end


these don't have to be cuda specific, we can add them to NNlib and remove the specific implementations (after a performance comparison)

Add what to nnlib, sorry? This package is specific to GPU functionality.

exactly these methods, with AbstractArray arguments, i.e. fallback on conv

Umm, we probably want to retain the cpu kernels anyway. Without explicitly having and launching Julia with many threads, grouped convolutions would scale with the number of groups.

this would be true for any implementation, specialized or not

julia> x′ = rand(Float32, 28, 28, 4, 2); julia> w′ = rand(Float32, 3, 3, 4, 30); julia> cdims = DenseConvDims(x′, w′, groups = 4) julia> @btime conv($x′, $w′, $cdims); 362.792 μs (86 allocations: 736.36 KiB) # -t1 236.368 μs (94 allocations: 831.89 KiB) # -t2 232.137 μs (94 allocations: 831.89 KiB) # -t4 julia> @btime depthwiseconv($x′, $(permutedims(w′, (1,2,4,3)))); 348.914 μs (42 allocations: 731.03 KiB) # -t1 156.558 μs (47 allocations: 826.53 KiB) # -t2 161.059 μs (47 allocations: 826.53 KiB) # -t4

This is with https://github.com/DhairyaLGandhi/NNlib.jl#dg/g2 which has a couple of fixes pending a PR.

ToucheSir

Looks reasonable to me, just needs a couple tests in https://github.com/FluxML/NNlibCUDA.jl/blob/master/test/conv.jl (I know the implementation is technically covered indirectly now, but there's no guarantee these methods will forward to the conv ones forever).

add depthwise_conv* overloads for CUDA

5343eca

CarloLucibello reviewed Jul 17, 2021

View reviewed changes

Merge branch 'master' into dg/depth

c1c6154

DhairyaLGandhi mentioned this pull request Jul 22, 2021

deprecate DepthwiseConv once we have groups in standard conv FluxML/Flux.jl#1667

Closed

ToucheSir reviewed Nov 12, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add depthwise_conv* overloads for CUDA #22

add depthwise_conv* overloads for CUDA #22

DhairyaLGandhi commented Jul 17, 2021

CarloLucibello Jul 17, 2021

DhairyaLGandhi Jul 17, 2021 •

edited

Loading

CarloLucibello Jul 17, 2021

DhairyaLGandhi Jul 17, 2021

CarloLucibello Jul 17, 2021

DhairyaLGandhi Jul 21, 2021

ToucheSir left a comment

add depthwise_conv* overloads for CUDA #22

Are you sure you want to change the base?

add depthwise_conv* overloads for CUDA #22

Conversation

DhairyaLGandhi commented Jul 17, 2021

CarloLucibello Jul 17, 2021

Choose a reason for hiding this comment

DhairyaLGandhi Jul 17, 2021 • edited Loading

Choose a reason for hiding this comment

CarloLucibello Jul 17, 2021

Choose a reason for hiding this comment

DhairyaLGandhi Jul 17, 2021

Choose a reason for hiding this comment

CarloLucibello Jul 17, 2021

Choose a reason for hiding this comment

DhairyaLGandhi Jul 21, 2021

Choose a reason for hiding this comment

ToucheSir left a comment

Choose a reason for hiding this comment

DhairyaLGandhi Jul 17, 2021 •

edited

Loading