Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change CI cuda versions to 10.2 #3869

Merged
merged 6 commits into from
May 21, 2021
Merged

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented May 21, 2021

Updates the CI to use cuda 10.2 instead of 10.1 so that we can start using fresh PyTorch core nightlies.

@NicolasHug
Copy link
Member

Thanks for the PR!

Let's see what the CI says. We might need to also change stuff like the use of image_name: "pytorch/manylinux-cuda101" in the yaml file, as well as other references to 101 in regenerate.py, in particular:

                if device_type == 'gpu':
                    if python_version != "3.8":
                        job['filters'] = gen_filter_branch_tree('master', 'nightly')
                    job['cu_version'] = 'cu101'

@datumbox
Copy link
Contributor Author

Thanks I'm still looking for more places I have to change. If you see others please let me know.

@datumbox datumbox force-pushed the cuda10.1_to_cuda10.2 branch from 062afef to 6a22a73 Compare May 21, 2021 09:03
@datumbox datumbox force-pushed the cuda10.1_to_cuda10.2 branch from 6a22a73 to 0020b68 Compare May 21, 2021 09:04
@datumbox datumbox requested a review from fmassa May 21, 2021 09:08
@datumbox datumbox changed the title Change cuda versions to 10.2 Change CI cuda versions to 10.2 May 21, 2021
@datumbox
Copy link
Contributor Author

datumbox commented May 21, 2021

It seems that the latest PyTorch core on linux has a new restriction on how you index things across devices:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument indices in method wrapper_Tensor_index_Tensor)

The problem appears only on Linux GPU and not on Windows:
https://app.circleci.com/pipelines/github/pytorch/vision/8241/workflows/6af4ce74-e157-4ca6-9eca-64ed7b9989ee/jobs/590609/tests#failed-test-0

I propose to merge this now and fix ASAP the issues on master on a separate PR.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting to unblock

@datumbox datumbox merged commit 91d9797 into pytorch:master May 21, 2021
@datumbox datumbox deleted the cuda10.1_to_cuda10.2 branch May 21, 2021 09:47
@ngimel
Copy link

ngimel commented May 22, 2021

To close the loop on this, indexing from python is not affected, because python_variable_indexing copies all the indices to self device before dispatching to index. However, having indices on the different device in c++ has been broken by pytorch/pytorch#56872. cc @wenleix, @ezyang

@wenleix
Copy link

wenleix commented May 22, 2021

@ngimel Thanks for reporting this. Will figure out a fix to this.

Update: index leverage TensorIterator and should be opt-out for automatic device check:

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorAdvancedIndexing.cpp#L296

make_index_iterator:

https://github.com/pytorch/pytorch/blob/dc8bc6ba4bbd61d21de4ffbccf5b79d22ff31a23/aten/src/ATen/native/TensorAdvancedIndexing.cpp#L266-L277

facebook-github-bot pushed a commit that referenced this pull request May 25, 2021
Summary:
* Change cuda versions.

* changing cu_version

* patching regenerate.py

* more changes.

Reviewed By: vincentqb, cpuhrsch

Differential Revision: D28677174

fbshipit-source-id: a32861bd62e3f5a3d5b19106e4f1773128ba1006
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants