Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GPU CI so that it runs on Python 3.8 #2973

Closed

Conversation

Licht-T
Copy link
Contributor

@Licht-T Licht-T commented Nov 8, 2020

This fixes #2972.

@Licht-T
Copy link
Contributor Author

Licht-T commented Nov 8, 2020

After merged, we might have to flush some CI cache. I recommend not to utilize CI cache functions.
Licht-T@5aa7d87

@codecov
Copy link

codecov bot commented Nov 8, 2020

Codecov Report

Merging #2973 (679cb03) into master (052edce) will decrease coverage by 1.07%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2973      +/-   ##
==========================================
- Coverage   73.42%   72.34%   -1.08%     
==========================================
  Files          99       99              
  Lines        8817     8817              
  Branches     1389     1389              
==========================================
- Hits         6474     6379      -95     
- Misses       1917     1999      +82     
- Partials      426      439      +13     
Impacted Files Coverage Δ
torchvision/ops/_register_onnx_ops.py 48.78% <0.00%> (-36.59%) ⬇️
torchvision/models/detection/transform.py 78.33% <0.00%> (-17.23%) ⬇️
torchvision/ops/poolers.py 86.53% <0.00%> (-11.54%) ⬇️
torchvision/ops/boxes.py 87.35% <0.00%> (-8.05%) ⬇️
torchvision/models/detection/roi_heads.py 77.23% <0.00%> (-5.11%) ⬇️
torchvision/models/detection/rpn.py 90.18% <0.00%> (-3.69%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 052edce...679cb03. Read the comment docs.

@@ -455,6 +455,7 @@ jobs:
resource_class: gpu.small
environment:
image_name: "pytorch/manylinux-cuda101"
PYTHON_VERSION: << parameters.python_version >>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was working on the same issue on torchaudio, and although I understand that this solves the issue, I wonder if there is a better way.
The problem here is environment: image_name: "..." overwrites the environment defined in binary_common. It seems to me that the ideal implementation is that image_name should be appended to the set of environment key-value pairs defined in binary_common, so that the downstream jobs do not have to re-define them. But with quick searching on YAML syntax, I cannot find such a way.
On the other hand, binary_common was defined for binary build process (probably by @seemethere ) and I just re-used it in unit test jobs without knowing much about it in the initial implementation of CCI job configurations (in torchaudio, which was later ported to torchvision).
Environment values like UPLOAD_CHANNEL and PYTORCH_VERSION have become an essential part of unit test scripts in the previous release work, #2846, so ideally we want environment variables to be defined in one place and we want them to be automagically propagated.
The use of Docker adds extra layer to this as it seems inevitable to manually pass environment values to the container which runs the test.

@mthrok
Copy link
Contributor

mthrok commented Nov 9, 2020

@Licht-T

Could you add an anchor to the original environment definition and alias that in GPU CI job definitions? That way, at least new definition to environment will be propagated to the VM. (still manually have to pass these environment variables to docker container though)
Something like https://github.com/pytorch/audio/pull/1014/files

@NicolasHug
Copy link
Member

Thanks for the PR @Licht-T ,

looking at some more recent CI jobs, it looks like CircleCI is properly able to run GPU tests on 3.8: https://app.circleci.com/pipelines/github/pytorch/vision/9150/workflows/5e911ca3-f0ef-409e-8437-d7b8649398c1/jobs/669225.

I'll close the PR, thanks again!

@NicolasHug NicolasHug closed this Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPU CI fails since Python 3.9 installed
4 participants