Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register nms and roi_align Autocast policy for PyTorch Intel GPU backend #8541

Merged
merged 4 commits into from
Aug 6, 2024

Conversation

fengyuan14
Copy link
Contributor

@fengyuan14 fengyuan14 commented Jul 23, 2024

Intel GPU backend has been introduced to PyTorch, https://github.com/pytorch/pytorch/blob/main/third_party/xpu.txt. Some Torchvision models fails on torchvision::nms and torchvision::roi_align when enabling AMP execution. The PR is filed to add Autocast policy for Intel GPU backend, aligning CUDA and CPU policy and calculating on float for these two operators.
See details in [RFC] Intel GPU Upstreaming · Issue #114723 · pytorch/pytorch (github.com) about PyTorch Intel GPU backend.

Copy link

pytorch-bot bot commented Jul 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8541

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 1 Unrelated Failure

As of commit 9ef9377 with merge base 61bd547 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@EikanWang
Copy link

Please refine the PR title and description.

…gn for XPU

Signed-off-by: Feng Yuan <feng1.yuan@intel.com>
@fengyuan14
Copy link
Contributor Author

Irrelevant lint error,
image

@fengyuan14
Copy link
Contributor Author

Please refine the PR title and description.

Updated.

@fengyuan14 fengyuan14 changed the title Register Autocast policy (Align with CUDA and CPU) of nms and roi_ali… Register nms and roi_align Autocast policy for XPU Jul 24, 2024
@fengyuan14 fengyuan14 changed the title Register nms and roi_align Autocast policy for XPU Register nms and roi_align Autocast policy for PyTorch Intel GPU backend Jul 24, 2024
@atalman
Copy link
Contributor

atalman commented Jul 24, 2024

hi @fengyuan14 looks like there are coupe of CI failures RuntimeError: operator torchvision::nms does not exist please fix. These does not looks like existing errors, current state of torchvision is green: https://hud2.pytorch.org/hud/pytorch/vision/main

@atalman atalman requested a review from NicolasHug July 24, 2024 13:54
@fengyuan14
Copy link
Contributor Author

fengyuan14 commented Jul 25, 2024

hi @fengyuan14 looks like there are coupe of CI failures RuntimeError: operator torchvision::nms does not exist please fix. These does not looks like existing errors, current state of torchvision is green: https://hud2.pytorch.org/hud/pytorch/vision/main

Sure. I am checking the difference here, since it works at my local.

@fengyuan14
Copy link
Contributor Author

fengyuan14 commented Jul 25, 2024

hi @fengyuan14 looks like there are coupe of CI failures RuntimeError: operator torchvision::nms does not exist please fix. These does not looks like existing errors, current state of torchvision is green: https://hud2.pytorch.org/hud/pytorch/vision/main

Checked the failure. The failure occurs in the step, Install testing utilities, and only on Macos+Python3.11. The step is ahead of building the Torchvision with my changes. In addition, other tests, Macos/Linux+Python3.xx, passed the step.

  Installing collected packages: urllib3, pluggy, iniconfig, idna, expecttest, coverage, charset-normalizer, certifi, requests, pytest, pytest-mock, pytest-cov
  Successfully installed certifi-2024.7.4 charset-normalizer-3.3.2 coverage-7.6.0 expecttest-0.2.1 idna-3.7 iniconfig-2.0.0 pluggy-1.5.0 pytest-7.4.4 pytest-cov-5.0.0 pytest-mock-3.14.0 requests-2.32.3 urllib3-2.2.2
Traceback (most recent call last):
  File "/Users/ec2-user/runner/_work/vision/vision/pytorch/vision/test/smoke_test.py", line 7, in <module>
    import torchvision
  File "/Users/ec2-user/.local/lib/python3.11/site-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "/Users/ec2-user/.local/lib/python3.11/site-packages/torchvision/_meta_registrations.py", line 163, in <module>
    @torch._custom_ops.impl_abstract("torchvision::nms")
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ec2-user/runner/_work/_temp/miniconda/envs/ci/lib/python3.11/site-packages/torch/library.py", line 744, in register
    use_lib._register_fake(op_name, func, _stacklevel=stacklevel + 1)
  File "/Users/ec2-user/runner/_work/_temp/miniconda/envs/ci/lib/python3.11/site-packages/torch/library.py", line 183, in _register_fake
    handle = entry.fake_impl.register(func_to_register, source)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ec2-user/runner/_work/_temp/miniconda/envs/ci/lib/python3.11/site-packages/torch/_library/fake_impl.py", line 31, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator torchvision::nms does not exist
ERROR conda.cli.main_run:execute(47): `conda run ./.github/scripts/unittest.sh` failed. (See above for error)
Error: Process completed with exit code 1.

Could you help to rerun the failure case?

@EikanWang
Copy link

@fengyuan14 , you can rebase this PR to trigger the CI again.

@EikanWang
Copy link

@pytorchbot rebase -b main

Copy link

pytorch-bot bot commented Jul 25, 2024

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

@fengyuan14
Copy link
Contributor Author

@NicolasHug Could you help to review the PR?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@NicolasHug NicolasHug merged commit c8c496d into pytorch:main Aug 6, 2024
57 of 65 checks passed
facebook-github-bot pushed a commit that referenced this pull request Aug 7, 2024
… GPU backend (#8541)

Summary: Signed-off-by: Feng Yuan <feng1.yuan@intel.com>

Differential Revision: D60903715

fbshipit-source-id: c0919c6b0473432b7ef7a30599c3dc143e3da1cf

Co-authored-by: Nicolas Hug <nh.nicolas.hug@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants