Fixes deform_conv issue with large input/output #4351

vfdev-5 · 2021-09-02T14:47:38Z

Description:

Fixed indexing int32 or int64 depending on the size of input/output and related tensors

!!! Currently, no tests provided as reproducible example allocates large tensors - any suggestions on that ?

vfdev-5 · 2021-09-02T15:18:24Z

torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu

+  if (use_64bits_indexing) {
+    AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+        input.scalar_type(), "deformable_im2col", ([&] {
+          deformable_im2col_kernel<scalar_t, int64_t><<<blocks, threads>>>(
+              num_kernels,
+              input.data_ptr<scalar_t>(),
+              data_offset.data_ptr<scalar_t>(),
+              data_mask.data_ptr<scalar_t>(),
+              height,
+              width,
+              weight_h,
+              weight_w,
+              pad_h,
+              pad_w,
+              stride_h,
+              stride_w,
+              dilation_h,
+              dilation_w,
+              parallel_imgs,
+              n_in_channels,
+              deformable_group,
+              out_h,
+              out_w,
+              use_mask,
+              data_col.data_ptr<scalar_t>());
+        }));
+
+  } else {
+    AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+        input.scalar_type(), "deformable_im2col", ([&] {
+          deformable_im2col_kernel<scalar_t, int><<<blocks, threads>>>(
+              num_kernels,
+              input.data_ptr<scalar_t>(),
+              data_offset.data_ptr<scalar_t>(),
+              data_mask.data_ptr<scalar_t>(),
+              height,
+              width,
+              weight_h,
+              weight_w,
+              pad_h,
+              pad_w,
+              stride_h,
+              stride_w,
+              dilation_h,
+              dilation_w,
+              parallel_imgs,
+              n_in_channels,
+              deformable_group,
+              out_h,
+              out_w,
+              use_mask,
+              data_col.data_ptr<scalar_t>());
+        }));
+  }


@fmassa any clever ideas on how to reduce code duplication ?

I think this is fine. You could write a macro here to call this only once, like what PyTorch does here for example, and then is dispatched here.
If you do this, don't forget to #undef the macros after they are used

fmassa

Changes LGTM, thanks @vfdev-5 !

If you want to send a follow-up PR adding the macros, I'm fine with it, but let's get this merged now

fmassa · 2021-09-06T13:28:42Z

torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu

+  if (use_64bits_indexing) {
+    AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+        input.scalar_type(), "deformable_im2col", ([&] {
+          deformable_im2col_kernel<scalar_t, int64_t><<<blocks, threads>>>(
+              num_kernels,
+              input.data_ptr<scalar_t>(),
+              data_offset.data_ptr<scalar_t>(),
+              data_mask.data_ptr<scalar_t>(),
+              height,
+              width,
+              weight_h,
+              weight_w,
+              pad_h,
+              pad_w,
+              stride_h,
+              stride_w,
+              dilation_h,
+              dilation_w,
+              parallel_imgs,
+              n_in_channels,
+              deformable_group,
+              out_h,
+              out_w,
+              use_mask,
+              data_col.data_ptr<scalar_t>());
+        }));
+
+  } else {
+    AT_DISPATCH_FLOATING_TYPES_AND_HALF(
+        input.scalar_type(), "deformable_im2col", ([&] {
+          deformable_im2col_kernel<scalar_t, int><<<blocks, threads>>>(
+              num_kernels,
+              input.data_ptr<scalar_t>(),
+              data_offset.data_ptr<scalar_t>(),
+              data_mask.data_ptr<scalar_t>(),
+              height,
+              width,
+              weight_h,
+              weight_w,
+              pad_h,
+              pad_w,
+              stride_h,
+              stride_w,
+              dilation_h,
+              dilation_w,
+              parallel_imgs,
+              n_in_channels,
+              deformable_group,
+              out_h,
+              out_w,
+              use_mask,
+              data_col.data_ptr<scalar_t>());
+        }));
+  }


I think this is fine. You could write a macro here to call this only once, like what PyTorch does here for example, and then is dispatched here.
If you do this, don't forget to #undef the macros after they are used

Summary: * WIP on fixing index overflow issue * Fixed backward pass for large num_kernels * Fixed clang formatting * Fixed GET_BLOCKS int/int64_t types issue Reviewed By: fmassa Differential Revision: D30793320 fbshipit-source-id: ce99a6c2c0f859b32d2c565da451640331f935f8 Co-authored-by: vfdev-5 <vfdev-5@gmail.com> Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

vfdev-5 added 2 commits September 2, 2021 06:26

WIP on fixing index overflow issue

dc61e80

Fixed backward pass for large num_kernels

a925502

facebook-github-bot added the cla signed label Sep 2, 2021

vfdev-5 and others added 3 commits September 2, 2021 16:47

Merge branch 'main' into fix-4269-deform-conv-index-overflow

fc5fae3

Fixed clang formatting

20fb5ac

Fixed GET_BLOCKS int/int64_t types issue

3498de6

vfdev-5 force-pushed the fix-4269-deform-conv-index-overflow branch from 30a2324 to 3498de6 Compare September 2, 2021 15:13

vfdev-5 commented Sep 2, 2021

View reviewed changes

vfdev-5 added the module: ops label Sep 2, 2021

vfdev-5 requested a review from fmassa September 2, 2021 18:47

fmassa approved these changes Sep 6, 2021

View reviewed changes

Merge branch 'main' into fix-4269-deform-conv-index-overflow

52d2d94

fmassa merged commit 6ce278b into pytorch:main Sep 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes deform_conv issue with large input/output #4351

Fixes deform_conv issue with large input/output #4351

vfdev-5 commented Sep 2, 2021 •

edited

Loading

vfdev-5 Sep 2, 2021

fmassa Sep 6, 2021

fmassa left a comment

fmassa Sep 6, 2021

Fixes deform_conv issue with large input/output #4351

Fixes deform_conv issue with large input/output #4351

Conversation

vfdev-5 commented Sep 2, 2021 • edited Loading

vfdev-5 Sep 2, 2021

Choose a reason for hiding this comment

fmassa Sep 6, 2021

Choose a reason for hiding this comment

fmassa left a comment

Choose a reason for hiding this comment

fmassa Sep 6, 2021

Choose a reason for hiding this comment

vfdev-5 commented Sep 2, 2021 •

edited

Loading