Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI/CD: Add CUDA version to docker image tags #13831

Merged
merged 46 commits into from
Aug 10, 2022
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
73c793a
append cuda version to tags
akihironitta Jul 25, 2022
46a05fc
revertme: push to hub
akihironitta Jul 25, 2022
1f0e5a4
Update docker readme
akihironitta Jul 25, 2022
6b23771
Build base-conda-py3.9-torch1.12-cuda11.3.1
akihironitta Jul 25, 2022
8a531ad
Use new images in conda tests
akihironitta Jul 25, 2022
0f7d534
revertme: push to hub
akihironitta Jul 25, 2022
62c3d3d
Revert "revertme: push to hub"
akihironitta Jul 25, 2022
e0b4fb8
Revert "revertme: push to hub"
akihironitta Jul 25, 2022
e08f694
Run conda if workflow edited
akihironitta Jul 25, 2022
72a8492
Run gpu testing if workflow edited
akihironitta Jul 25, 2022
8c77bbf
Merge branch 'master' into ci/rename-docker-tags
akihironitta Jul 25, 2022
3c35bef
Use new tags in release/Dockerfile
akihironitta Jul 25, 2022
cfd45f7
Build base-cuda and PL release images with all combinations
akihironitta Jul 25, 2022
69de92f
Update release docker
akihironitta Jul 25, 2022
1112450
Update conda from py3.9-torch1.12 to py3.10-torch.1.12
akihironitta Jul 25, 2022
e1901f0
Fix ubuntu version
akihironitta Jul 25, 2022
07a335c
Revert conda
akihironitta Jul 25, 2022
58fe926
revertme: push to hub
akihironitta Jul 25, 2022
139e9ea
Don't build Python 3.10 for now...
akihironitta Jul 25, 2022
7f3385e
Fix pl release builder
akihironitta Jul 26, 2022
109bc2f
updating version contribute to the error? https://github.com/docker/b…
akihironitta Jul 26, 2022
01f0d06
Update actions' versions
akihironitta Jul 26, 2022
d9a2a4c
Update slack user to notify
akihironitta Jul 29, 2022
6c58530
Merge branch 'master' into ci/rename-docker-tags
akihironitta Aug 1, 2022
6db670e
Don't use 11.6.0 to avoid bagua incompatibility
akihironitta Aug 2, 2022
b43d9e0
Merge branch 'master' into ci/rename-docker-tags
akihironitta Aug 3, 2022
b5643af
Don't use 11.1, and use 11.1.1
akihironitta Aug 3, 2022
b0a66db
Merge branch 'ci/update-slack-user' into ci/rename-docker-tags
akihironitta Aug 3, 2022
72d9b9e
Update .github/workflows/ci-pytorch_test-conda.yml
akihironitta Aug 5, 2022
a7a1fac
Merge branch 'master' into ci/rename-docker-tags
akihironitta Aug 5, 2022
49dc2fe
Update trigger
akihironitta Aug 5, 2022
7e1372e
Ignore artfacts from tutorials
akihironitta Aug 5, 2022
cd4a107
Trim docker images to distribute
akihironitta Aug 5, 2022
4b814e7
Add an image for tutorials
akihironitta Aug 5, 2022
d134b69
Update conda image 3.8x1.10
akihironitta Aug 6, 2022
ea2ce9a
Merge branch 'master' into ci/rename-docker-tags
akihironitta Aug 6, 2022
3a10724
Try different conda variants
akihironitta Aug 6, 2022
0c2af7d
Merge branch 'ci/rename-docker-tags' of github.com:Lightning-AI/light…
akihironitta Aug 6, 2022
da6cecf
Merge branch 'master' into ci/rename-docker-tags
akihironitta Aug 9, 2022
69e6d95
Merge branch 'master' into ci/rename-docker-tags
akihironitta Aug 10, 2022
832fe94
Merge branch 'master' into ci/rename-docker-tags
akihironitta Aug 10, 2022
0da9d97
Merge branch 'master' into ci/rename-docker-tags
akihironitta Aug 10, 2022
b7188dd
No need to set cuda for conda jobs
akihironitta Aug 10, 2022
7f61fcd
Update who to notify ipu failure
akihironitta Aug 10, 2022
6431d0e
Don't push
akihironitta Aug 10, 2022
5a91844
update filenaem
akihironitta Aug 10, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .azure/gpu-benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
cancelTimeoutInMinutes: "2"
pool: azure-jirka-spot
container:
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12"
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12-cuda11.3.1"
options: "--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all --shm-size=32g"
workspace:
clean: all
Expand Down
4 changes: 2 additions & 2 deletions .azure/gpu-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
strategy:
matrix:
'PyTorch - stable':
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12"
image: "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12-cuda11.3.1"
# how long to run the job before automatically cancelling
timeoutInMinutes: "80"
# how much time to give 'run always even if cancelled tasks' before stopping them
Expand All @@ -44,7 +44,7 @@ jobs:

- bash: |
CHANGED_FILES=$(git diff --name-status origin/master -- . | awk '{print $2}')
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*|.azure/*'
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*|.azure/gpu-*.yml'
echo $CHANGED_FILES > changed_files.txt
MATCHES=$(cat changed_files.txt | grep -E $FILTER)
echo $MATCHES
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/ci-pytorch_test-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,16 @@ defaults:
jobs:
conda:
runs-on: ubuntu-20.04
container: pytorchlightning/pytorch_lightning:base-conda-py${{ matrix.python-version }}-torch${{ matrix.pytorch-version }}
container: pytorchlightning/pytorch_lightning:base-conda-py${{ matrix.python-version }}-torch${{ matrix.pytorch-version }}-cuda${{ matrix.cuda-version }}
strategy:
fail-fast: false
matrix:
# nightly: add when there's a release candidate
include:
- {python-version: "3.8", pytorch-version: "1.9"}
- {python-version: "3.8", pytorch-version: "1.10"}
- {python-version: "3.9", pytorch-version: "1.11"}
- {python-version: "3.9", pytorch-version: "1.12"}
- {python-version: "3.8", pytorch-version: "1.9", cuda-version: "11.1"}
- {python-version: "3.8", pytorch-version: "1.10", cuda-version: "11.1"}
akihironitta marked this conversation as resolved.
Show resolved Hide resolved
- {python-version: "3.9", pytorch-version: "1.11", cuda-version: "11.3.1"}
- {python-version: "3.9", pytorch-version: "1.12", cuda-version: "11.3.1"}

timeout-minutes: 30

Expand All @@ -45,7 +45,7 @@ jobs:
id: skip
shell: bash -l {0}
run: |
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*'
FILTER='src/pytorch_lightning|requirements/pytorch|tests/tests_pytorch|examples/pl_*|.github/workflows/ci-pytorch_test-conda.yml'
echo "${{ steps.changed-files.outputs.all_changed_files }}" | tr " " "\n" > changed_files.txt
MATCHES=$(cat changed_files.txt | grep -E $FILTER)
echo $MATCHES
Expand Down
99 changes: 61 additions & 38 deletions .github/workflows/cicd-pytorch_dockers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,25 +21,39 @@ concurrency:
cancel-in-progress: ${{ ! (github.ref == 'refs/heads/master' || startsWith(github.ref, 'refs/heads/release/')) }}

env:
PUSH_TO_HUB: ${{ github.event_name == 'schedule' }}
PUSH_TO_HUB: true

jobs:
build-pl:
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
# the config used in '.azure-pipelines/gpu-tests.yml' since the Dockerfile uses the cuda image
python_version: ["3.9"]
pytorch_version: ["1.12"]
include:
# Include all Python and PyTorch versions that PL supports.
- {python_version: "3.7", pytorch_version: "1.9", cuda_version: "11.1.1"}
- {python_version: "3.7", pytorch_version: "1.10", cuda_version: "11.3.1"}
- {python_version: "3.7", pytorch_version: "1.11", cuda_version: "11.3.1"}
- {python_version: "3.7", pytorch_version: "1.12", cuda_version: "11.3.1"}
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1.1"}
- {python_version: "3.8", pytorch_version: "1.10", cuda_version: "11.3.1"}
- {python_version: "3.8", pytorch_version: "1.11", cuda_version: "11.3.1"}
- {python_version: "3.8", pytorch_version: "1.12", cuda_version: "11.3.1"}
- {python_version: "3.9", pytorch_version: "1.9", cuda_version: "11.1.1"}
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.3.1"}
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1"}
# - {python_version: "3.10", pytorch_version: "1.11", cuda_version: "11.3.1"}
# - {python_version: "3.10", pytorch_version: "1.12", cuda_version: "11.3.1"}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/build-push-action@v2
- uses: docker/build-push-action@v3
with:
build-args: |
PYTHON_VERSION=${{ matrix.python_version }}
PYTORCH_VERSION=${{ matrix.pytorch_version }}
CUDA_VERSION=${{ matrix.cuda_version }}
file: dockers/release/Dockerfile
push: false # pushed in release-docker.yml only when PL is released
timeout-minutes: 50
Expand All @@ -53,14 +67,14 @@ jobs:
python_version: ["3.7"]
xla_version: ["1.11"]
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v1
- uses: docker/login-action@v2
if: env.PUSH_TO_HUB == 'true'
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- uses: docker/build-push-action@v2
- uses: docker/build-push-action@v3
with:
build-args: |
PYTHON_VERSION=${{ matrix.python_version }}
Expand All @@ -85,30 +99,39 @@ jobs:
fail-fast: false
matrix:
include:
# the config used in '.azure-pipelines/gpu-tests.yml'
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1", ubuntu_version: "20.04"}
# latest (used in Tutorials)
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1", ubuntu_version: "20.04"}
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.1", ubuntu_version: "20.04"}
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1", ubuntu_version: "20.04"}
# These are the base images for PL release docker image distributions,
# so include all Python and PyTorch versions that PL supports.
- {python_version: "3.7", pytorch_version: "1.9", cuda_version: "11.1.1"}
- {python_version: "3.7", pytorch_version: "1.10", cuda_version: "11.3.1"}
- {python_version: "3.7", pytorch_version: "1.11", cuda_version: "11.3.1"}
- {python_version: "3.7", pytorch_version: "1.12", cuda_version: "11.3.1"}
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1.1"}
- {python_version: "3.8", pytorch_version: "1.10", cuda_version: "11.3.1"}
- {python_version: "3.8", pytorch_version: "1.11", cuda_version: "11.3.1"}
- {python_version: "3.8", pytorch_version: "1.12", cuda_version: "11.3.1"}
- {python_version: "3.9", pytorch_version: "1.9", cuda_version: "11.1.1"}
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.3.1"}
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1"}
# - {python_version: "3.10", pytorch_version: "1.11", cuda_version: "11.3.1"}
# - {python_version: "3.10", pytorch_version: "1.12", cuda_version: "11.3.1"}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v1
- uses: docker/login-action@v2
if: env.PUSH_TO_HUB == 'true'
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- uses: docker/build-push-action@v2
- uses: docker/build-push-action@v3
with:
build-args: |
PYTHON_VERSION=${{ matrix.python_version }}
PYTORCH_VERSION=${{ matrix.pytorch_version }}
CUDA_VERSION=${{ matrix.cuda_version }}
UBUNTU_VERSION=${{ matrix.ubuntu_version }}
file: dockers/base-cuda/Dockerfile
push: ${{ env.PUSH_TO_HUB }}
tags: pytorchlightning/pytorch_lightning:base-cuda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
tags: pytorchlightning/pytorch_lightning:base-cuda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}-cuda${{ matrix.cuda_version }}
timeout-minutes: 95
- uses: ravsamhq/notify-slack-action@v1
if: failure() && env.PUSH_TO_HUB == 'true'
Expand All @@ -126,28 +149,28 @@ jobs:
fail-fast: false
matrix:
include:
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1"}
- {python_version: "3.8", pytorch_version: "1.10", cuda_version: "11.1"}
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1.1"}
- {python_version: "3.8", pytorch_version: "1.10", cuda_version: "11.1.1"}
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
# nightly: add when there's a release candidate
# - {python_version: "3.9", pytorch_version: "1.12"}
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1"}
# - {python_version: "3.10", pytorch_version: "1.12", cuda_version: "11.6.0"}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v1
- uses: docker/login-action@v2
if: env.PUSH_TO_HUB == 'true'
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- uses: docker/build-push-action@v2
- uses: docker/build-push-action@v3
with:
build-args: |
PYTHON_VERSION=${{ matrix.python_version }}
PYTORCH_VERSION=${{ matrix.pytorch_version }}
CUDA_VERSION=${{ matrix.cuda_version }}
file: dockers/base-conda/Dockerfile
push: ${{ env.PUSH_TO_HUB }}
tags: pytorchlightning/pytorch_lightning:base-conda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
tags: pytorchlightning/pytorch_lightning:base-conda-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}-cuda${{ matrix.cuda_version }}
timeout-minutes: 95
- uses: ravsamhq/notify-slack-action@v1
if: failure() && env.PUSH_TO_HUB == 'true'
Expand All @@ -168,14 +191,14 @@ jobs:
# the config used in 'dockers/ci-runner-ipu/Dockerfile'
- {python_version: "3.9", pytorch_version: "1.9"}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v1
- uses: docker/login-action@v2
if: env.PUSH_TO_HUB == 'true'
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- uses: docker/build-push-action@v2
- uses: docker/build-push-action@v3
with:
build-args: |
PYTHON_VERSION=${{ matrix.python_version }}
Expand All @@ -184,7 +207,7 @@ jobs:
push: ${{ env.PUSH_TO_HUB }}
tags: pytorchlightning/pytorch_lightning:base-ipu-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}
timeout-minutes: 100
- uses: docker/build-push-action@v2
- uses: docker/build-push-action@v3
with:
build-args: |
PYTHON_VERSION=${{ matrix.python_version }}
Expand All @@ -199,7 +222,7 @@ jobs:
status: ${{ job.status }}
token: ${{ secrets.GITHUB_TOKEN }}
notification_title: ${{ format('IPU; {0} py{1} for *{2}*', runner.os, matrix.python_version, matrix.pytorch_version) }}
message_format: '{emoji} *{workflow}* {status_message}, see <{run_url}|detail>, cc: <@U01BULUS2BG>' # SeanNaren
message_format: '{emoji} *{workflow}* {status_message}, see <{run_url}|detail>, cc: <@@U01A5T7EY9M>' # akihironitta
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Expand All @@ -212,14 +235,14 @@ jobs:
# the config used in 'dockers/ci-runner-hpu/Dockerfile'
- {gaudi_version: "1.5.0", pytorch_version: "1.11.0"}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v1
- uses: docker/login-action@v2
if: env.PUSH_TO_HUB == 'true'
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- uses: docker/build-push-action@v2
- uses: docker/build-push-action@v3
with:
build-args: |
DIST=latest
Expand All @@ -243,10 +266,10 @@ jobs:
runs-on: ubuntu-20.04
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Build Conda Docker
# publish master/release
uses: docker/build-push-action@v2
uses: docker/build-push-action@v3
with:
file: dockers/nvidia/Dockerfile
push: false
Expand Down
41 changes: 32 additions & 9 deletions .github/workflows/release-docker.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
name: Docker
# https://www.docker.com/blog/first-docker-github-action-is-here
# https://github.com/docker/build-push-action

on:
push:
branches: [master, "release/*"]
Expand All @@ -15,8 +14,22 @@ jobs:
strategy:
fail-fast: false
matrix:
python_version: ["3.7", "3.8", "3.9"]
pytorch_version: ["1.9", "1.10"]
include:
# Include all Python and PyTorch versions that PL supports.
- {python_version: "3.7", pytorch_version: "1.9", cuda_version: "11.1.1"}
- {python_version: "3.7", pytorch_version: "1.10", cuda_version: "11.3.1"}
- {python_version: "3.7", pytorch_version: "1.11", cuda_version: "11.3.1"}
- {python_version: "3.7", pytorch_version: "1.12", cuda_version: "11.3.1"}
- {python_version: "3.8", pytorch_version: "1.9", cuda_version: "11.1.1"}
- {python_version: "3.8", pytorch_version: "1.10", cuda_version: "11.3.1"}
- {python_version: "3.8", pytorch_version: "1.11", cuda_version: "11.3.1"}
- {python_version: "3.8", pytorch_version: "1.12", cuda_version: "11.3.1"}
- {python_version: "3.9", pytorch_version: "1.9", cuda_version: "11.1.1"}
- {python_version: "3.9", pytorch_version: "1.10", cuda_version: "11.3.1"}
- {python_version: "3.9", pytorch_version: "1.11", cuda_version: "11.3.1"}
- {python_version: "3.9", pytorch_version: "1.12", cuda_version: "11.3.1"}
# - {python_version: "3.10", pytorch_version: "1.11", cuda_version: "11.3.1"}
# - {python_version: "3.10", pytorch_version: "1.12", cuda_version: "11.3.1"}
steps:
- name: Checkout
uses: actions/checkout@v2
Expand All @@ -32,19 +45,29 @@ jobs:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
dockerfile: dockers/release/Dockerfile
build_args: PYTHON_VERSION=${{ matrix.python_version }},PYTORCH_VERSION=${{ matrix.pytorch_version }},LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
tags: "${{ steps.get_version.outputs.RELEASE_VERSION }}-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }},latest-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}"
build_args: |
PYTHON_VERSION=${{ matrix.python_version }}
PYTORCH_VERSION=${{ matrix.pytorch_version }}
CUDA_VERSION=${{ matrix.cuda_version }}
LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
tags: |
${{ steps.get_version.outputs.RELEASE_VERSION }}-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}-cuda${{ matrix.cuda_version }}
latest-py${{ matrix.python_version }}-torch${{ matrix.pytorch_version }}-cuda${{ matrix.cuda_version }}
timeout-minutes: 55

- name: Publish Latest to Docker
uses: docker/build-push-action@v1.1.0
# only on releases and latest Python and PyTorch
if: matrix.python_version == '3.9' && matrix.pytorch_version == '1.10'
# Only latest Python and PyTorch
if: matrix.python_version == '3.9' && matrix.pytorch_version == '1.11'
with:
repository: pytorchlightning/pytorch_lightning
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
dockerfile: dockers/release/Dockerfile
build_args: PYTHON_VERSION=${{ matrix.python_version }},PYTORCH_VERSION=${{ matrix.pytorch_version }},LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
build_args: |
PYTHON_VERSION=${{ matrix.python_version }}
PYTORCH_VERSION=${{ matrix.pytorch_version }}
CUDA_VERSION=${{ matrix.cuda_version }}
LIGHTNING_VERSION=${{ steps.get_version.outputs.RELEASE_VERSION }}
tags: "latest"
timeout-minutes: 55
Loading