Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Add ml_dypes dependency for all docker images #15226

Merged
merged 3 commits into from
Jul 21, 2023

Conversation

yzh119
Copy link
Member

@yzh119 yzh119 commented Jul 4, 2023

The PR #15183 adds ml_dtypes as a dependency of TVM, however, some of the CI docker images do not have ml_dtypes installed, the PR fixes the issue.

cc @yongwww , would you mind also helping upload the new images with ml_dtypes installed to docker hub?

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jul 4, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@github-actions github-actions bot requested a review from yongwww July 4, 2023 14:16
@yzh119
Copy link
Member Author

yzh119 commented Jul 4, 2023

@tvm-bot rerun

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. Thanks for the PR @yzh119.

Which images were tested with the proposed change?

I couldn't verify the usage of ubuntu2004_install_python_package.sh in ci_cpu, for example. Perhaps you meant ubuntu_install_python_package.sh instead?

@yzh119
Copy link
Member Author

yzh119 commented Jul 8, 2023

@leandron thank you for your review!

The ci_i386 DockerFile depends on ubuntu2004_install_python_package, and the docker image needs to be updated and uploaded.

Besides ci_i386, other docker images that have to be updated include ci_arm, ci_wasm, ci_riscv, ci_minimal, ci_ hexagon, ci_cpu, ci_gpu, ci_arm (they rely on ubuntu_install_python_package.sh which already has ml_dtypes as the dependency but the docker image was still out-dated).

@yzh119
Copy link
Member Author

yzh119 commented Jul 10, 2023

Hi @leandron do you have any further feedback?

@@ -43,4 +43,5 @@ pip3 install --upgrade \
junitparser==2.4.2 \
six \
tornado \
pytest-lazy-fixture
pytest-lazy-fixture \
git+https://github.com/jax-ml/ml_dtypes.git@v0.2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should favour the usage of this package from Pypi, like other platforms. I checked and the source package is there: https://pypi.org/project/ml-dtypes/0.2.0/#files

Can we just move to a plain ml_dtypes type of dependency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using ml_dtypes from PyPI at first but unfortunately, there is no i686 prebuilt so it will fallout to compile from source (.tar.gz file), and it doesn't work (see this flow: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-docker/detail/PR-15226/2/pipeline).

Copy link
Contributor

@leandron leandron Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I'm fine with the change, but would also like to see another PR with test passing when using the temporary Docker images generated by this job before we proceed merging this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd love to, but the docker images would only be uploaded to tlcpack if the PR gets merged into the mainline.

Copy link
Contributor

@leandron leandron Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great you want to do the test and you are correct wrt tlcpack. I'm happy to say that there is a way: if you check in the CI results box below, you'll see a docker/pr-head, which generated a set of temporary images and uploaded them to AWS ECR.

This is the specific one for this job: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-docker/detail/PR-15226/4/pipeline

Usually people will get those images and submit a new PR to test CI with the temporary images (e.g. 477529581014.dkr.ecr.us-west-2.amazonaws.com/ci_i386:PR-15226-c0454f162-4, by updating https://github.com/apache/tvm/blob/main/ci/jenkins/docker-images.ini with the image names. The test PR can be closed as soon as it finishes successfully, then we merge this one.

Other PRs in past did it and it is safer for us to push for better testing on Docker changes, as they take a long time to troubleshoot when we break them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @leandron I tried following your instructions in #15329 but it seems those temporary images were not found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both #15183 #15190 are blocking because of this PR.

@leandron
Copy link
Contributor

The temporary images might have been cleaned up. I’d suggest re running CI and throwing the test job as soon as the image rebuild finishes.

Sorry, I have no access to the temp. images configuration.

@yzh119
Copy link
Member Author

yzh119 commented Jul 19, 2023

@tvm-bot rerun

@tqchen
Copy link
Member

tqchen commented Jul 19, 2023

I don't see any point blocking on this given it is just a minor dependency update and unlikely break anything.

Technically, the set of binary images are also still stable and point to the previous build. This is exactly the reason why we used the binary tag in the first place.

  • Aka until we change Jenkinsfile to point to the new binary the main CI won't break.
  • If we switch to the new image because all the code do not yet depend on the new package likely they won't break
  • Even in rare situation that they do break (I cannot think of any), we can quickly revert the change if the binary breaks, and there won't be any blockers on the CI(because all builds still use the old binary image).

Likely immediately move main CI to include the dependency also won't break. Maybe I am missing something here.. but my technical access-ment is that there is no risk in merging it in.

@yzh119
Copy link
Member Author

yzh119 commented Jul 21, 2023

Hi @leandron , I'll upload my locally built CI image to my docker hub account, and use this docker image in #15329 , do you think it is acceptable?

@junrushao
Copy link
Member

Considering the minor nature of this dependency update, there seems to be no reason for us to hold back. The risk of it causing any significant disruptions is quite low. From a technical perspective, the current set of binary images remains stable and is still directed towards the previous build. This is precisely why we initially opted for the binary tag. Until we make the necessary changes to the Jenkinsfile, pointing it to the new binary, our main CI process will remain unaffected. The existing code does not currently rely on the new package, so it is highly unlikely to cause any complications. Even in the rare event that it does result in issues (though I cannot think of any at the moment), we can promptly revert the change if the binary breaks. As a result, there won't be any blockers in the CI pipeline since all builds will continue using the old binary image. Furthermore, it is probable that integrating the new dependency into the main CI process immediately will not lead to any problems. Though I may be overlooking certain aspects, my technical assessment indicates that there is minimal risk associated with incorporating the update.

@junrushao junrushao dismissed leandron’s stale review July 21, 2023 06:24

See my comment above

@junrushao junrushao merged commit 0fadb98 into apache:main Jul 21, 2023
yzh119 added a commit that referenced this pull request Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants