Add modulation input for DeformConv2D #2791

Licht-T · 2020-10-11T15:48:32Z

This is the implementation of Modulated Deformable Convolution, a.k.a. DCNv2. This closes #1788.

fmassa

Thanks a lot for the PR!

I haven't yet gone in details in the C++ / CUDA parts, but I have a few comments in the python part that could help fix the torchscript tests.

torchvision/ops/deform_conv.py

fmassa · 2020-10-12T16:28:15Z

torchvision/ops/deform_conv.py

        >>> print(out.shape)
        >>> # returns
        >>>  torch.Size([4, 5, 8, 8])
    """

    _assert_has_ops()
    out_channels = weight.shape[0]
+
+    use_mask = mask is not None


Maybe a better way of doing this which is torchscript-compliant is to remove use_mask, and instead do

if mask is not None: mask = ... assert mask is not None

this way we don't need to change anything in the bottom.

Thoughts ?

As mentioned above, I removed the shape assertion in Python code. I also found that it is better to pass use_mask flag into C++ code for shape assertions, so I did.

Licht-T · 2020-10-13T00:20:40Z

@fmassa Thanks for you review! I changed my code. TorchScript now works well.

Also, I have changed some assertion logics (use_mask related) so that it can correctly determine whether it use the modulation or not. I've tested this on CPU forward/backward. Unfortunately, since my GPU environment has broken last night, I have not yet tested this on GPU. I'll fix up my environment in couple of days and test this on GPU.

Licht-T · 2020-10-13T01:22:14Z

Hmmm..., the GPU CI test on Windows works fine, but the Linux one isn't. That's weird...

Licht-T · 2020-10-13T01:24:29Z

Anyway, I'll check as soon as my GPU environment fixed!

fmassa · 2020-10-13T08:39:48Z

Sounds good.

The tests are randomized, so it might be that in some cases they pass and others they fail. But the gradchecks that are erroing on Linux GPU don't seem to be due to numerical error, so there might be a case that has been missed somewhere.

I'll rerun the tests to see if the CI status change.

Licht-T · 2020-10-16T01:58:38Z

@fmassa Thanks for your CI re-run, but the situation didn't change.

My Linux GPU environment is finally back, and my PR passes GPU tests! This is strange thing, and I am still digging into this.

(base) rito@LAPTOP-097573RD:~/vision$ pytest test/test_ops.py::DeformConvTester
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.7.7, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/rito/vision
plugins: forked-1.3.0, typeguard-2.9.1, extra-durations-0.1.3, xdist-1.34.0
collected 8 items

test/test_ops.py ........                                                                                                                                                        [100%]

============================================================================== sum of all tests durations ==============================================================================
1357.47s
============================================================================ 8 passed in 1358.45s (0:22:38) ============================================================================

Licht-T · 2020-10-17T13:30:21Z

I run test_ops.py tests on the AWS g4dn.xlarge instance, and no test failure happened.

(base) ubuntu@ip-172-31-29-152:~/vision$ uname -a
Linux ip-172-31-29-152 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
(base) ubuntu@ip-172-31-29-152:~/vision$ nvidia-smi
Sat Oct 17 13:22:19 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   26C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
(base) ubuntu@ip-172-31-29-152:~/vision$ pytest test/test_ops.py
================================= test session starts ==================================
platform linux -- Python 3.8.3, pytest-6.1.1, py-1.9.0, pluggy-0.13.1
rootdir: /home/ubuntu/vision
plugins: cov-2.10.1
collected 62 items

test/test_ops.py ..............................................................  [100%]

============================ 62 passed in 321.95s (0:05:21) ============================
(base) ubuntu@ip-172-31-29-152:~/vision$

Licht-T · 2020-10-17T13:59:45Z

I found insufficient resources errors on CI logs. This test failure only happens on the environment which has small GPU resources, such as CI.
https://app.circleci.com/pipelines/github/pytorch/vision/4464/workflows/4d7daf2d-7466-460c-a1f7-76ae3495653c/jobs/266953/steps

error in compute_grad_offset_and_mask: too many resources requested for launch

Licht-T · 2020-10-17T14:19:43Z

@fmassa The Linux GPU CI runs on a small instance. On the other hand, the Windows one is on a medium instance. This led to the test result difference between Linux and Windows. We can reduce # of parallelization, but it makes a reduction in the performance. I recommend that we use the medium instance for Linux GPU CI.

vision/.circleci/config.yml

Line 442 in a9c78f1

resource_class: gpu.small

vision/.circleci/config.yml

Line 18 in a9c78f1

resource_class: windows.gpu.nvidia.medium

Licht-T · 2020-10-18T16:17:43Z

@fmassa I tried the running GPU CI on gpu.medium instance, but it did not make sense. Finally, I found that the Linux environment is CUDA Compute Capability (CC) 5.2, but the Windows one is CC 7.5. The Linux instance has the old GPU even if it is "medium" 🤔 .

According to the official calculator, the CC < 6 has smaller number of registers per block than CC >= 6 and cannot launch the kernel which uses a large number of registers. So I decide to select the proper thread size whether CC is higher than 6 or not.

And now, the PR passed all tests. All-green, except Travis CI!

facebook-github-bot · 2020-11-02T19:11:47Z

Hi @Licht-T!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

fmassa

thanks a lot for the PR and sorry for the delay in getting back to you, the release was taking most of our time over the past couple of weeks.

This PR looks good to merge, but there are some merge conflicts now, could you look into fixing those?

codecov · 2020-11-08T12:04:26Z

Codecov Report

Merging #2791 (5aa7d87) into master (052edce) will decrease coverage by 1.06%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2791      +/-   ##
==========================================
- Coverage   73.42%   72.35%   -1.07%     
==========================================
  Files          99       99              
  Lines        8817     8820       +3     
  Branches     1389     1390       +1     
==========================================
- Hits         6474     6382      -92     
- Misses       1917     1999      +82     
- Partials      426      439      +13

Impacted Files	Coverage Δ
torchvision/ops/deform_conv.py	`72.30% <100.00%> (+1.33%)`	⬆️
torchvision/ops/_register_onnx_ops.py	`48.78% <0.00%> (-36.59%)`	⬇️
torchvision/models/detection/transform.py	`78.33% <0.00%> (-17.23%)`	⬇️
torchvision/ops/poolers.py	`86.53% <0.00%> (-11.54%)`	⬇️
torchvision/ops/boxes.py	`87.35% <0.00%> (-8.05%)`	⬇️
torchvision/models/detection/roi_heads.py	`77.23% <0.00%> (-5.11%)`	⬇️
torchvision/models/detection/rpn.py	`90.18% <0.00%> (-3.69%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 052edce...5aa7d87. Read the comment docs.

Licht-T · 2020-11-08T12:05:27Z

@fmassa No problem! Now, fixed conflicts and CI is all green except CodeCov!

Also, I made #2973 since GPU CI was not working well.

fmassa

Thanks a lot for the PR!

I'm accepting the config.yml changes for now to get this PR merged, but there might be better ways of handling the caching and CI.

I'll create an issue to track those down.

fmassa · 2020-11-09T10:30:11Z

.circleci/config.yml.in

-      - restore_cache:
-          {% raw %}
-          keys:
-            - env-v2-linux-{{ arch }}-py<< parameters.python_version >>-{{ checksum ".circleci/unittest/linux/scripts/environment.yml" }}-{{ checksum ".circleci-weekly" }}


I believe the recommended way to handle invalid caches on CI is to change the key name, instead of having it be env-v2 let it be env-v3 for example.

fmassa · 2020-11-09T10:31:56Z

.circleci/config.yml.in

@@ -455,21 +455,17 @@ jobs:
    resource_class: gpu.small
    environment:
      image_name: "pytorch/manylinux-cuda101"
+      PYTHON_VERSION: << parameters.python_version >>


Those changes are also present in #2973, and there is some discussion going on with @mthrok on the best way to solve this.

In order to move forward with this PR, I'll be merging this as is but let's maybe revisit this implementation in a follow-up PR following @mthrok and @seemethere feedback.

fmassa · 2020-11-09T10:33:52Z

torchvision/ops/deform_conv.py

@@ -33,6 +34,9 @@ def deform_conv2d(
        padding (int or Tuple[int, int]): height/width of padding of zeroes around
            each image. Default: 0
        dilation (int or Tuple[int, int]): the spacing between kernel elements. Default: 1
+        mask (Tensor[batch_size, offset_groups * kernel_height * kernel_width,
+            out_height, out_width]): masks to be applied for each position in the
+            convolution kernel.


For a follow-up PR, it might be good to mention that now DeformConv2d implements modulation from https://arxiv.org/abs/1811.11168 as well (also known as DeformConv v2)

* Add modulation input for DeformConv2D * lint * Patch for GPU CI * Remove bad cache on CI

Summary: `test_backward_cuda_contiguous` and `test_backward_cuda_non_contiguous` have been failing on fbcode for a while with the following error `too many resources requested for launch` which suggests that too may threads per block are requested. This issue was already causing problems in the original PR #2791 (comment), where the author decided that CC >= 6 was a good threshold because with CC >= 6 GPUs have more registers. (CC = Compute Capability) However, I'm not certain that this is actually true: if we look at https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications, it's clear that 6.2 has less registers per thread block than 6.0. So I'm not sure this threshold completely makes sense. Moreover, let's note that that the current tests (as on `master`): - **pass** on OSS linux CI which rely on a P4 GPU (up to last week), i.e. **CC = 6.1** - **pass** on OSS windows CI which relies on a T4 GPU, i.e. **CC = 7.5** - **fail** on the AWS cluster which relies on a V100 GPU, i.e. **CC = 7.0** It is quite unclear to me what kind of resource is "enough" for the tests to pass on both 6.1 and 7.5 but not on 7.0. As a result, I think it's safer to just reduce the number of threads per block, irrespective of the CC. ngimel, fmassa suggested that I tag you here since you could have some valuable insight for us. Thanks! Reviewed By: fmassa Differential Revision: D28641626 fbshipit-source-id: 2618c366c5d18bbb7ebafc33032e7ac6c0404d0b

Summary: `test_backward_cuda_contiguous` and `test_backward_cuda_non_contiguous` have been failing on fbcode for a while with the following error `too many resources requested for launch` which suggests that too may threads per block are requested. This issue was already causing problems in the original PR pytorch#2791 (comment), where the author decided that CC >= 6 was a good threshold because with CC >= 6 GPUs have more registers. (CC = Compute Capability) However, I'm not certain that this is actually true: if we look at https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications, it's clear that 6.2 has less registers per thread block than 6.0. So I'm not sure this threshold completely makes sense. Moreover, let's note that that the current tests (as on `master`): - **pass** on OSS linux CI which rely on a P4 GPU (up to last week), i.e. **CC = 6.1** - **pass** on OSS windows CI which relies on a T4 GPU, i.e. **CC = 7.5** - **fail** on the AWS cluster which relies on a V100 GPU, i.e. **CC = 7.0** It is quite unclear to me what kind of resource is "enough" for the tests to pass on both 6.1 and 7.5 but not on 7.0. As a result, I think it's safer to just reduce the number of threads per block, irrespective of the CC. ngimel, fmassa suggested that I tag you here since you could have some valuable insight for us. Thanks! Reviewed By: fmassa Differential Revision: D28641626 fbshipit-source-id: 2618c366c5d18bbb7ebafc33032e7ac6c0404d0b

fmassa reviewed Oct 12, 2020

View reviewed changes

Licht-T force-pushed the add-modulation-for-deformable-convolution branch from 192b955 to 0707337 Compare October 18, 2020 15:42

facebook-github-bot added the cla signed label Nov 3, 2020

fmassa reviewed Nov 6, 2020

View reviewed changes

Add modulation input for DeformConv2D

2285693

Licht-T force-pushed the add-modulation-for-deformable-convolution branch from 292c4f6 to 2285693 Compare November 8, 2020 09:55

Licht-T added 3 commits November 8, 2020 19:04

lint

be919e8

Patch for GPU CI

0e89614

Remove bad cache on CI

5aa7d87

Licht-T force-pushed the add-modulation-for-deformable-convolution branch from 3bc4dfd to 5aa7d87 Compare November 8, 2020 11:29

fmassa approved these changes Nov 9, 2020

View reviewed changes

fmassa merged commit 5a4bb19 into pytorch:master Nov 9, 2020

This was referenced Nov 9, 2020

[CI] Add back caching to CircleCI #2974

Closed

[DOC] Mention that DeformConv2d now supports modulation #2975

Closed

bryant1410 pushed a commit to bryant1410/vision-1 that referenced this pull request Nov 22, 2020

Add modulation input for DeformConv2D (pytorch#2791)

7dafbb0

* Add modulation input for DeformConv2D * lint * Patch for GPU CI * Remove bad cache on CI

vfdev-5 pushed a commit to Quansight/vision that referenced this pull request Dec 4, 2020

Add modulation input for DeformConv2D (pytorch#2791)

c261e92

* Add modulation input for DeformConv2D * lint * Patch for GPU CI * Remove bad cache on CI

NicolasHug mentioned this pull request Feb 15, 2021

Restore cache in circleCI #3401

Merged

NicolasHug mentioned this pull request Jun 1, 2021

[FBcode->GH] Fix DeformConvTester::test_backward_cuda #3942

Merged

ibayer mentioned this pull request May 25, 2022

difference between DCNv2 and deform_conv2d in torchvision CharlesShang/DCNv2#124

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add modulation input for DeformConv2D #2791

Add modulation input for DeformConv2D #2791

Licht-T commented Oct 11, 2020

fmassa left a comment

fmassa Oct 12, 2020

Licht-T Oct 13, 2020

Licht-T commented Oct 13, 2020

Licht-T commented Oct 13, 2020

Licht-T commented Oct 13, 2020

fmassa commented Oct 13, 2020

Licht-T commented Oct 16, 2020 •

edited

Loading

Licht-T commented Oct 17, 2020 •

edited

Loading

Licht-T commented Oct 17, 2020 •

edited

Loading

Licht-T commented Oct 17, 2020 •

edited

Loading

Licht-T commented Oct 18, 2020

facebook-github-bot commented Nov 2, 2020

fmassa left a comment

codecov bot commented Nov 8, 2020 •

edited

Loading

Licht-T commented Nov 8, 2020

fmassa left a comment

fmassa Nov 9, 2020

fmassa Nov 9, 2020

fmassa Nov 9, 2020

Add modulation input for DeformConv2D #2791

Add modulation input for DeformConv2D #2791

Conversation

Licht-T commented Oct 11, 2020

fmassa left a comment

Choose a reason for hiding this comment

fmassa Oct 12, 2020

Choose a reason for hiding this comment

Licht-T Oct 13, 2020

Choose a reason for hiding this comment

Licht-T commented Oct 13, 2020

Licht-T commented Oct 13, 2020

Licht-T commented Oct 13, 2020

fmassa commented Oct 13, 2020

Licht-T commented Oct 16, 2020 • edited Loading

Licht-T commented Oct 17, 2020 • edited Loading

Licht-T commented Oct 17, 2020 • edited Loading

Licht-T commented Oct 17, 2020 • edited Loading

Licht-T commented Oct 18, 2020

facebook-github-bot commented Nov 2, 2020

fmassa left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 8, 2020 • edited Loading

Codecov Report

Licht-T commented Nov 8, 2020

fmassa left a comment

Choose a reason for hiding this comment

fmassa Nov 9, 2020

Choose a reason for hiding this comment

fmassa Nov 9, 2020

Choose a reason for hiding this comment

fmassa Nov 9, 2020

Choose a reason for hiding this comment

Licht-T commented Oct 16, 2020 •

edited

Loading

Licht-T commented Oct 17, 2020 •

edited

Loading

Licht-T commented Oct 17, 2020 •

edited

Loading

Licht-T commented Oct 17, 2020 •

edited

Loading

codecov bot commented Nov 8, 2020 •

edited

Loading