Skip to content

Commit

Permalink
Merge pull request #38 from vkt1414/master
Browse files Browse the repository at this point in the history
A github actions workflow to test if getting_started colab notebooks running properly in latest colab docker runtime env.
  • Loading branch information
fedorov authored Oct 26, 2023
2 parents a415970 + 8ab822f commit d9e8cbc
Show file tree
Hide file tree
Showing 9 changed files with 9,036 additions and 0 deletions.
117 changes: 117 additions & 0 deletions .github/workflows/test_colab.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
name: Check Commits and Colab Images

on:
push:
branches: [ "master" ]
pull_request:
branches: [ "master" ]
workflow_dispatch:
schedule:
- cron: 0 12 */1 * *

jobs:
check_commits_and_images:
runs-on: ubuntu-latest
permissions:
contents: write

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.x

- name: Install dependencies
run: pip install requests pandas google-cloud-bigquery pyarrow nbformat

- name: Authorize Google Cloud
uses: google-github-actions/auth@v1
with:
credentials_json: ${{ secrets.SERVICE_ACCOUNT_KEY }}
create_credentials_file: true
export_environment_variables: true

- name: Run check-commits.py and check-colab-images.py, preprocess notebooks scripts
run: |
python test/src/check-commits.py
python test/src/check-colab-images.py
python test/src/preProcessNotebooks.py
- name: Set result output
id: set-result
run: |
if [[ -f "check_colab_images_result.txt" ]]; then
RESULT=$(cat "check_colab_images_result.txt")
echo "RESULT=$RESULT" >> $GITHUB_ENV
fi
- name: Free Disk Space (Ubuntu)
uses: jlumbroso/free-disk-space@main
with:
tool-cache: false
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: true
swap-storage: true

- name: Docker login
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

- name: Pull from GCP and Push Docker image to Docker Hub
if: env.RESULT == 'true'
run: |
docker pull us-docker.pkg.dev/colab-images/public/runtime:latest
docker tag us-docker.pkg.dev/colab-images/public/runtime:latest imagingdatacommons/idc-testing-colab:latest
docker push imagingdatacommons/idc-testing-colab:latest
- name: Pull Docker image from Docker Hub
if: env.RESULT == 'false'
run: |
docker pull imagingdatacommons/idc-testing-colab:latest
- name: Copy Google Cloud credentials to Docker container
run: |
CREDENTIALS_FILE_PATH="${{ env.GOOGLE_APPLICATION_CREDENTIALS }}"
CREDENTIALS_FILE_NAME=$(basename "$CREDENTIALS_FILE_PATH")
GOOGLE_APPLICATION_CREDENTIALS="/content/$CREDENTIALS_FILE_NAME"
echo "GOOGLE_APPLICATION_CREDENTIALS=$GOOGLE_APPLICATION_CREDENTIALS" >> $GITHUB_ENV
- name: Run notebook with papermill
run: |
for nb in part1_prerequisites part2_searching_basics part3_exploring_cohorts; do
docker run -d --name colab -v "$(pwd):/content" -e GOOGLE_APPLICATION_CREDENTIALS="${{ env.GOOGLE_APPLICATION_CREDENTIALS }}" imagingdatacommons/idc-testing-colab:latest
docker exec -t colab /bin/bash -c "pip install papermill"
docker exec -t colab /bin/bash -c "set -o xtrace && set -o errexit && set -o pipefail && set -o nounset && set +o errexit && cd content/ && papermill /content/notebooks/getting_started/${nb}.ipynb /content/test/outputs/${nb}_papermill_output.ipynb && set -o errexit && ls -A"
#docker exec -t colab /bin/bash -c "jupyter nbconvert --to html --ExtractOutputPreprocessor.enabled=False /content/test/outputs/output_${nb}.ipynb"
docker stop colab
docker rm colab
done
- name: Commit changes
if: ${{ github.event_name != 'pull_request' }}
uses: stefanzweifel/git-auto-commit-action@v4
with:
commit_message: 'Check colab env'
file_pattern: 'test/*.csv test/outputs/*.ipynb'
branch: 'master'

#- name: Check output notebooks for errors
# run: |
# for nb in part1_prerequisites part2_searching_basics part3_exploring_cohorts; do
# if grep -q '"name": "stderr"\|"status": "failed"' test/outputs/output_${nb}.ipynb; then
# echo "Error messages found in the ${nb} notebook output:"
# cat test/outputs/output_${nb}.ipynb
# exit 1
# else
# echo "No errors found in the ${nb} notebook output."
# fi
# done
# exit $EXIT_CODE
58 changes: 58 additions & 0 deletions test/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Check Commits and Colab Images

This GitHub repository uses a GitHub Actions workflow to check if getting started notebooks in IDC-Tutorials are working as expected in the Google Colab environment.

# Status

[![Getting Started Notebooks in the latest Colab environment](https://github.com/ImagingDataCommons/IDC-Tutorials/actions/workflows/test_colab.yml/badge.svg)](https://github.com/ImagingDataCommons/IDC-Tutorials/actions/workflows/test_colab.yml)

## Workflow

1. **Check for Image Changes**:
- Make an API call to Artifact Registry to check if there are new Docker images.

```shell
gcloud artifacts docker tags list us-docker.pkg.dev/colab-images/public/runtime --format=json --quiet
```
- Compare the `sh256digest` with the previous latest image.

2. **Preprocess Notebooks**:
- Use an IDC Google Cloud Project ID, instead of getting it interactively.
- Handle typical authentication from Colab notebooks using Application Default Credentials instead of `auth.authenticate_user()`.
- The action `google-github-actions` when used with `export_environment_variables: true` exposes the path of Application Default Credentials with the env variable GOOGLE_APPLICATION_CREDENTIALS.
- Some notebooks require the user to enter the query. In such cases, the expected query is induced.
3. **Docker Image Handling**:
- If the Colab Docker image is changed, pull it and push it to Docker Hub (as the frequency of Colab image updates is shorter than the frequency of pulling the image for testing, we do not want to pile up charges by using Artifact Registry directly).
- If no changes, just pull the image from Docker Hub.
- To save disk space, use [`jlumbroso/free-disk-space@main`](https://github.com/jlumbroso/free-disk-space) to gain additional storage.
4. **Running Notebooks with Papermill**:
- Attach the repository source directory to the container's `/content` folder.
- Install the [`papermill`](https://papermill.readthedocs.io/) package to run the notebooks.
- Capture `papermill` output and handle any errors.
5. **Update Repository**:
- Automatically commit the output files generated by the Docker container using [`stefanzweifel/git-auto-commit-action@v4`](https://github.com/stefanzweifel/git-auto-commit-action).
- Offers a quick way to see, at which cell the notebook failed.
## Prerequisites
Before using the workflow, make sure to set the required secrets in your repository:
- `SERVICE_ACCOUNT_KEY`: Google Cloud service account key JSON (make sure to convert it to ONE LINE JSON).
Note: minimum permissions required for the service account: `Bigquery User`
- `DOCKER_USERNAME`: Docker Hub username.
- `DOCKER_PASSWORD`: Docker Hub password or access token.
## Resources
- [Papermill](https://papermill.readthedocs.io/)
- [Application Default Credentials based login](https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login)
- [Google GitHub Actions](https://github.com/google-github-actions)
- [Commits](https://github.com/vkt1414/track-colab-env/commits/main)
- [Space Saving](https://github.com/jlumbroso/free-disk-space)
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
7 changes: 7 additions & 0 deletions test/colab-images-list.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
date,tag,sha256,docker_pull_tag,docker_pull_sha256_tag
,latest,sha256:4a26494c9c92ab4d0515e0715d79dfecbe8cfacb9b86fcfd55bc0274cb89530d,us-docker.pkg.dev/colab-images/public/runtime:latest,us-docker.pkg.dev/colab-images/public/runtime@sha256:4a26494c9c92ab4d0515e0715d79dfecbe8cfacb9b86fcfd55bc0274cb89530d
20230515,release-colab-20230515-060150-RC00,sha256:3a8fc58f7e81b96dc59a2fb48b7973802f59fdd634fb538569228d830a7e76a9,us-docker.pkg.dev/colab-images/public/runtime:release-colab-20230515-060150-RC00,us-docker.pkg.dev/colab-images/public/runtime@sha256:3a8fc58f7e81b96dc59a2fb48b7973802f59fdd634fb538569228d830a7e76a9
20230622,release-colab-20230622-060123-RC01,sha256:7dac57e02aae4e83aab349563190a71bdd07374e1365f53bd6a50280046c6091,us-docker.pkg.dev/colab-images/public/runtime:release-colab-20230622-060123-RC01,us-docker.pkg.dev/colab-images/public/runtime@sha256:7dac57e02aae4e83aab349563190a71bdd07374e1365f53bd6a50280046c6091
20230711,release-colab-20230711-060203-RC00,sha256:53dc33f450cd162d8a42c5aff02d50ac24eb9fc68be77f0374614ad07247e9cd,us-docker.pkg.dev/colab-images/public/runtime:release-colab-20230711-060203-RC00,us-docker.pkg.dev/colab-images/public/runtime@sha256:53dc33f450cd162d8a42c5aff02d50ac24eb9fc68be77f0374614ad07247e9cd
20230803,release-colab-20230803-060151-RC00,sha256:ae8a5bf22a84c67fb4b35aa4b1f19dac94b01b56a97c5c7bb15db57552e8d38c,us-docker.pkg.dev/colab-images/public/runtime:release-colab-20230803-060151-RC00,us-docker.pkg.dev/colab-images/public/runtime@sha256:ae8a5bf22a84c67fb4b35aa4b1f19dac94b01b56a97c5c7bb15db57552e8d38c
20230921,release-colab_20230921-060057_RC00,sha256:4a26494c9c92ab4d0515e0715d79dfecbe8cfacb9b86fcfd55bc0274cb89530d,us-docker.pkg.dev/colab-images/public/runtime:release-colab_20230921-060057_RC00,us-docker.pkg.dev/colab-images/public/runtime@sha256:4a26494c9c92ab4d0515e0715d79dfecbe8cfacb9b86fcfd55bc0274cb89530d
Loading

0 comments on commit d9e8cbc

Please sign in to comment.