Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kubeflowkatib/mxnet-mnist image #1866

Merged
merged 1 commit into from
May 18, 2022

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented May 17, 2022

What this PR does / why we need it:

It seems that mxnet/python:latest_cpu_native_py3 image has been deleted from upstream dockerhub.
As a result, our suite test can no longer be run as shown below, so I changed the base image from mxnet/python:latest_cpu_native_py3 to python:3.9.

error building image: GET https://index.docker.io/v2/mxnet/python/manifests/latest_cpu_native_py3: MANIFEST_UNKNOWN: manifest unknown; unknown tag=latest_cpu_native_py3

Argo Workflow Logs

Also, I have used ytenzen/katib-mxnet-mnist:fix-mxnet-image to check the operation of the new image.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

@tenzen-y
Copy link
Member Author

When I use mxnet/python:2.0.0beta1_cpu_py3 as the base image, I have faced on below errors.
So, it might be better to modify examples for mxnet in the future.

Downloading data/train-labels-idx1-ubyte.gz from https://repo.mxnet.io/gluon/dataset/mnist/train-labels-idx1-ubyte.gz...
Downloading data/train-images-idx3-ubyte.gz from https://repo.mxnet.io/gluon/dataset/mnist/train-images-idx3-ubyte.gz...
Downloading data/t10k-labels-idx1-ubyte.gz from https://repo.mxnet.io/gluon/dataset/mnist/t10k-labels-idx1-ubyte.gz...
Downloading data/t10k-images-idx3-ubyte.gz from https://repo.mxnet.io/gluon/dataset/mnist/t10k-images-idx3-ubyte.gz...
[17:14:13] ../src/storage/storage.cc:202: Using Pooled (Naive) StorageManager for CPU
Traceback (most recent call last):
  File "/opt/mxnet-mnist/mnist.py", line 86, in <module>
    fit.fit(args, sym, get_mnist_iter)
  File "/opt/mxnet-mnist/common/fit.py", line 228, in fit
    model = mx.mod.Module(
AttributeError: module 'mxnet' has no attribute 'mod'

@coveralls
Copy link

coveralls commented May 17, 2022

Coverage Status

Coverage increased (+0.2%) to 73.978% when pulling 0f281d1 on tenzen-y:fix-mxnet-image into ea23e71 on kubeflow:master.

@tenzen-y
Copy link
Member Author

Also, this PR blocks #1865 and #1833.

@tenzen-y
Copy link
Member Author

@tenzen-y
Copy link
Member Author

It seems to fail to deploy katib in Charmed Katib/ Test.

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/charmcraft/charm_builder.py", line 361, in
main()
File "/usr/local/lib/python3.8/dist-packages/charmcraft/charm_builder.py", line 357, in main
builder.build_charm()
File "/usr/local/lib/python3.8/dist-packages/charmcraft/charm_builder.py", line 93, in build_charm
self.handle_dependencies()
File "/usr/local/lib/python3.8/dist-packages/charmcraft/charm_builder.py", line 248, in handle_dependencies
_process_run(cmd)
File "/usr/local/lib/python3.8/dist-packages/charmcraft/charm_builder.py", line 299, in _process_run
raise CommandError(f"Subprocess command {cmd} execution failed with retcode {retcode}")
charmcraft.cmdbase.CommandError: Subprocess command ['/home/runner/work/katib/katib/operators/katib-ui/build/parts/charm/build/staging-venv/bin/pip3', 'install', '--upgrade', '--no-binary', ':all:', '--requirement=requirements.txt'] execution failed with retcode 1
Parts processing error: Failed to run the build script for part 'charm'. (full execution logs in '/tmp/charmcraft-log-xp6383qh')
Error: Error running subcommand charmcraft pack -p ./katib-ui --destructive-mode: exit status: 1
Error: Process completed with exit code 1.

https://github.com/kubeflow/katib/runs/6475581693?check_suite_focus=true#step:6:372

Could you help to resolve this problem? @DomFleischmann @knkski @ca-scribner

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ca-scribner
Copy link
Contributor

I don't recognise this error, but I'll get someone to take a look. Should be a quick fix and likely is related to some tooling and not actually broken by this PR.

@ca-scribner
Copy link
Contributor

Waiting on CI to confirm but this should be fixed by #1867.

@tenzen-y
Copy link
Member Author

Waiting on CI to confirm but this should be fixed by #1867.

Thank you for letting me know @ca-scribner.

@johnugeorge
Copy link
Member

Please rebase !

@google-oss-prow google-oss-prow bot removed the lgtm label May 18, 2022
@tenzen-y
Copy link
Member Author

Please rebase !

@johnugeorge I have rebased this branch.

@johnugeorge
Copy link
Member

Thanks @tenzen-y

/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit d385d14 into kubeflow:master May 18, 2022
@tenzen-y tenzen-y deleted the fix-mxnet-image branch May 19, 2022 02:10
@tenzen-y tenzen-y mentioned this pull request May 20, 2022
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants