Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI/CD: Add CUDA version to docker image tags #13831

Merged
merged 46 commits into from
Aug 10, 2022
Merged

Conversation

akihironitta
Copy link
Contributor

@akihironitta akihironitta commented Jul 25, 2022

What does this PR do?

  1. Changes docker image tags that we currently have on the hub. For example,
- pytorchlightning/pytorch_lightning:base-conda-py3.8-torch1.9
+ pytorchlightning/pytorch_lightning:base-conda-py3.8-torch1.9-cuda11.3.1
  1. Updates combinations in release-docker.yml and their corresponding base docker images.
  2. (Note: Doesn't touch any of conda-related CI/CD jobs until Rethink conda tests in CI #14059 concludes.)

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • [n/a] Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • [n/a] Did you list all the breaking changes introduced by this pull request?
  • [n/a] Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @carmocca @akihironitta @Borda

@akihironitta akihironitta added the ci Continuous Integration label Jul 25, 2022
@carmocca
Copy link
Contributor

These new images will need to be pushed to the hub before merging this PR, or you will block master until the nightly job runs

@akihironitta akihironitta changed the title [wip] CI/CD: Append CUDA version to docker image tags CD: Append CUDA version to docker image tags Jul 25, 2022
@akihironitta akihironitta changed the title CD: Append CUDA version to docker image tags [wip] CD: Append CUDA version to docker image tags Jul 25, 2022
@akihironitta akihironitta changed the title [wip] CD: Append CUDA version to docker image tags CD: Append CUDA version to docker image tags Jul 25, 2022
@mergify mergify bot added the ready PRs ready to be merged label Jul 25, 2022
@mergify mergify bot requested a review from a team July 25, 2022 19:47
@akihironitta akihironitta enabled auto-merge (squash) July 25, 2022 19:49
@akihironitta akihironitta changed the title CD: Append CUDA version to docker image tags [wip] CD: Append CUDA version to docker image tags Jul 25, 2022
@akihironitta akihironitta mentioned this pull request Aug 9, 2022
10 tasks
@mergify mergify bot removed the has conflicts label Aug 10, 2022
@akihironitta akihironitta changed the title [wip] CI/CD: Add CUDA version to docker image tags CI/CD: Add CUDA version to docker image tags Aug 10, 2022
@mergify mergify bot added the ready PRs ready to be merged label Aug 10, 2022
@akihironitta
Copy link
Contributor Author

akihironitta commented Aug 10, 2022

Some of the jobs building some docker images are failing but are completely irrelevant to the change here. The failures will be resolved in the followings:

@akihironitta akihironitta enabled auto-merge (squash) August 10, 2022 10:15
@akihironitta akihironitta merged commit d5f35ec into master Aug 10, 2022
@akihironitta akihironitta deleted the ci/rename-docker-tags branch August 10, 2022 10:37
awaelchli pushed a commit that referenced this pull request Aug 15, 2022
* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>
jessecambon pushed a commit to jessecambon/lightning that referenced this pull request Aug 16, 2022
* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>
lexierule pushed a commit that referenced this pull request Aug 17, 2022
* update version and changelog for 1.7.2 release

* Reset all results on epoch end (#14061)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Skip ddp fork tests on windows (#14121)

* Fix device placement when `.cuda()` called without specifying index (#14128)

* Convert subprocess test to standalone test (#14101)

* Fix entry point test for Python 3.10 (#14154)

* Fix flaky test caused by weak reference (#14157)

* Fix saving hyperparameters in a composition where parent is not a LM or LDM (#14151)



Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Remove DeepSpeed version restriction from Lite (#13967)

* Configure the check-group app (#14165)

Co-authored-by: Jirka <jirka.borovec@seznam.cz>

* Update onnxruntime requirement from <=1.12.0 to <1.13.0 in /requirements (#14083)

Updates the requirements on [onnxruntime](https://github.com/microsoft/onnxruntime) to permit the latest version.
- [Release notes](https://github.com/microsoft/onnxruntime/releases)
- [Changelog](https://github.com/microsoft/onnxruntime/blob/master/docs/ReleaseManagement.md)
- [Commits](microsoft/onnxruntime@v0.1.4...v1.12.1)

---
updated-dependencies:
- dependency-name: onnxruntime
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update gcsfs requirement from <2022.6.0,>=2021.5.0 to >=2021.5.0,<2022.8.0 in /requirements (#14079)

Update gcsfs requirement in /requirements

Updates the requirements on [gcsfs](https://github.com/fsspec/gcsfs) to permit the latest version.
- [Release notes](https://github.com/fsspec/gcsfs/releases)
- [Commits](fsspec/gcsfs@2021.05.0...2022.7.1)

---
updated-dependencies:
- dependency-name: gcsfs
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix a bug that caused spurious `AttributeError` when multiple `DataLoader` classes are imported (#14117)


fix

* CI: Replace `_` of in GHA workflow filenames with `-` (#13917)

* Rename workflow files

* Update docs

* Fix azure badges

* Update the main readme

* bad rebase

* Update doc

* CI: Update Windows version from 2019 to 2022 (#14129)

Update windows

* CI/CD: Add CUDA version to docker image tags (#13831)

* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Avoid entry_points deprecation warning (#14052)

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Configure the check-group app (#14165)

Co-authored-by: Jirka <jirka.borovec@seznam.cz>

* Profile batch transfer and gradient clipping hooks (#14069)

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Avoid false positive warning about using `sync_dist` when using torchmetrics (#14143)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Avoid raising the sampler warning if num_replicas=1 (#14097)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>

* Remove skipping logic in favor of path filtering (#14170)

* Support checkpoint save and load with Stochastic Weight Averaging (#9938)

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Use fsdp module to initialize precision scalar for fsdp native (#14092)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* add more issues types (#14174)

* add more issues types

* Update .github/ISSUE_TEMPLATE/config.yml

Co-authored-by: Mansy <ahmed.mansy156@gmail.com>

* typo

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Mansy <ahmed.mansy156@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* CI: clean building docs (#14216)

* CI: clean building docs

* group

* .

* CI: docker focus on PL only (#14246)

* CI: docker focus on PL only

* group

* Allowed setting attributes on `DataLoader` and `BatchSampler` when instantiated inside `*_dataloader` hooks (#14212)


Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>

* Revert "Remove skipping logic in favor of path filtering (#14170)" (#14244)

* Update defaults for WandbLogger's run name and project name (#14145)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
Co-authored-by: Adam Reeve <adreeve@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Mansy <ahmed.mansy156@gmail.com>
Borda pushed a commit that referenced this pull request Aug 22, 2022
* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

(cherry picked from commit d5f35ec)
lantiga pushed a commit that referenced this pull request Aug 22, 2022
* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

(cherry picked from commit d5f35ec)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
admin Requires admin privileges to merge ci Continuous Integration ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants