Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CUDA 12.1.1 patch version in CI #107295

Closed
wants to merge 2 commits into from

Conversation

atalman
Copy link
Contributor

@atalman atalman commented Aug 16, 2023

Update cuda 12.1.1

After :
Nightly Linux - pytorch/builder#1476
Nightly Windows - pytorch/builder#1485

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 16, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107295

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8240624 with merge base 5b9b816 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Aug 16, 2023
@atalman atalman added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 17, 2023
@atalman
Copy link
Contributor Author

atalman commented Aug 18, 2023

@pytorchbot merge -f "all required check passing"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ZainRizvi
Copy link
Contributor

@ptrblck
Copy link
Collaborator

ptrblck commented Aug 21, 2023

@atalman I'm seeing:

2023-08-18T16:17:37.2570266Z �[91mE: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/Packages.gz  File has unexpected size (1127072 != 1126689). Mirror sync in progress? [IP: 152.195.19.142 443]
2023-08-18T16:17:37.2570814Z    Hashes of expected file:
2023-08-18T16:17:37.2571427Z     - Filesize:1126689 [weak]
2023-08-18T16:17:37.2571815Z     - SHA256:7cee2584ca6d97b2f07018ba4f9c3c473fb4e299ed170968a1f8c99c090cc59f
2023-08-18T16:17:37.2572222Z     - SHA1:4ee24fac5518a3fcc3702590a0dab32c95484c54 [weak]
2023-08-18T16:17:37.2572548Z     - MD5Sum:593faff511765d11055c9919bf2e3bf8 [weak]
2023-08-18T16:17:37.2572827Z    Release file created at: Thu, 17 Aug 2023 19:03:05 +0000
2023-08-18T16:17:37.2573156Z E: Some index files failed to download. They have been ignored, or old ones used instead.
2023-08-18T16:17:37.6625070Z The command '/bin/sh -c bash ./install_base.sh && rm install_base.sh' returned a non-zero code: 100

which seems to fail in 2023-08-18T16:17:33.7908846Z + apt-get update. I'll try to see what exactly is being downloaded.
EDIT:
It seems this file is failing with an unexpected size:
https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/Packages.gz
However, locally it matches:

ll | grep Pack
-rw-rw-r--  1 pbialecki pbialecki    1126689 Aug 21 09:09 Packages.gz

Also:

23-08-19T00:24:25.3897070Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor.py', '--shard-id=0', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '--sc=inductor/test_torchinductor_0', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2023-08-19 00:24:25.389282]
2023-08-19T00:37:15.4858612Z ##[error]The action has timed out.

Do you know how torchinductor is related to the CUDA update?

@atalman
Copy link
Contributor Author

atalman commented Aug 21, 2023

@ZainRizvi Rerunning periodic tests, this looks like flaky issue
Screenshot 2023-08-21 at 2 18 47 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category with-ssh
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants