Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Trial images build to the CI #1457

Merged
merged 10 commits into from
Mar 10, 2021

Conversation

andreyvelich
Copy link
Member

Blocked by: kubeflow/testing#923.

I added build of Katib Trial template images to the CI to make sure that each Trial template example can be built and executed.

As well, I fixed few problems with the current images:

  • Fix Downloading MNIST dataset with torchvision gives HTTP Error 403 pytorch/vision#1938 problem in PyTorch mnist example.
  • Add --log-path flag to PyTorch mnist to use this example in File and StdOut metrics collector mode.
  • Use docker.io/kubeflowkatib/pytorch-mnist instead of gcr.io/kubeflow-ci/pytorch-dist-mnist-test:v1.0 since it is similar images.
  • Modify PyTorch mnist Dockerfile to be consistent with MXNet mnist
  • Use mxnet/python:latest_cpu_native_py3 base image in MXNet since it has all required MXNet setup.
  • Use tensorflow/tensorflow:1.15.4-py3 and tensorflow/tensorflow:1.15.4-gpu-py3 for ENAS CNN Trial template. With that setup we can have only 1 requirements.txt for this image and we can cut the Dockerfile.

After we merge this PR, I will execute release script to push new images to the kubeflowkatib registry. Then I will update tags for the Katib Trial templates.

/assign @gaocegege @johnugeorge

/cc @PatrickXYS

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andreyvelich andreyvelich changed the title [WIP] Add Trial images build to the CI Add Trial images build to the CI Mar 9, 2021
@andreyvelich
Copy link
Member Author

@gaocegege @johnugeorge I think the tests are working fine, can you take a look please ?

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@google-oss-robot google-oss-robot merged commit c6c9172 into kubeflow:master Mar 10, 2021
@andreyvelich andreyvelich deleted the fix-training-containers branch September 30, 2021 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Downloading MNIST dataset with torchvision gives HTTP Error 403
4 participants