Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Automatic platform ARGs #1935

Merged

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Aug 20, 2022

What this PR does / why we need it:
To resolve the complex build conditions for each CPU architecture, I refactored all Dockerfiles using Automatic platform ARGs.

Also, I switched builder to docker buildx since Automatic platform ARGs are only available when using the BuildKit as described in the following documentation.

This feature is only available when using the BuildKit backend.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Part-of: #1900

Checklist:

  • Docs included if any changes are user facing

@coveralls
Copy link

coveralls commented Aug 20, 2022

Coverage Status

Coverage decreased (-0.5%) to 73.456% when pulling bd4587c on tenzen-y:intrduce-automatic-platform-args into 1927e73 on kubeflow:master.

@tenzen-y tenzen-y force-pushed the intrduce-automatic-platform-args branch 3 times, most recently from 924beef to c3d0edf Compare August 20, 2022 10:35
@tenzen-y tenzen-y changed the title Introduce Automatic platform ARGs [WIP] Introduce Automatic platform ARGs Aug 20, 2022
@tenzen-y tenzen-y force-pushed the intrduce-automatic-platform-args branch from c3d0edf to c9c5ada Compare August 20, 2022 11:30
@tenzen-y tenzen-y force-pushed the intrduce-automatic-platform-args branch from c9c5ada to 5a541f6 Compare August 20, 2022 11:47
@tenzen-y tenzen-y changed the title [WIP] Introduce Automatic platform ARGs Introduce Automatic platform ARGs Aug 20, 2022
@tenzen-y tenzen-y force-pushed the intrduce-automatic-platform-args branch from 471a8a5 to bd4587c Compare August 20, 2022 15:44
Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaocegege, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@anencore94 anencore94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Looks nice :) BTW, it's hard to test build all images for each cpu architecture now. Do you think we need to test docker bulid on gihtub action runner with arm-architecture ?

@google-oss-prow google-oss-prow bot added the lgtm label Aug 21, 2022
@google-oss-prow google-oss-prow bot merged commit fe4d6e7 into kubeflow:master Aug 21, 2022
@tenzen-y
Copy link
Member Author

tenzen-y commented Aug 21, 2022

/lgtm

Looks nice :) BTW, it's hard to test build all images for each cpu architecture now. Do you think we need to test docker bulid on gihtub action runner with arm-architecture ?

Thanks for your comments. @anencore94
The goal of #1900 is to be able to build a multi-architecture supported image (support arm64 and amd64 architecture with a single image tag) on gh-actions and a local machine.

@tenzen-y tenzen-y deleted the intrduce-automatic-platform-args branch August 21, 2022 03:14
@anencore94
Copy link
Member

Thanks for your comments. @anencore94
The goal of #1900 is to be able to build a multi-architecture supported image (support arm64 and amd64 architecture with a single image tag) on gh-actions and a local machine.

I agree with your comment :), Though what I meant was that I think we may need to check arm-image is built "well".
For example, by contructing k8s cluster in arm-action-runner and setup katib with all arm-katib-image with same tag(supported feature in this PR), and see whether all e2e test passed. @tenzen-y

What do you think of it ?

@tenzen-y
Copy link
Member Author

tenzen-y commented Aug 21, 2022

Thanks for your comments. @anencore94
The goal of #1900 is to be able to build a multi-architecture supported image (support arm64 and amd64 architecture with a single image tag) on gh-actions and a local machine.

I agree with your comment :), Though what I meant was that I think we may need to check arm-image is built "well". For example, by contructing k8s cluster in arm-action-runner and setup katib with all arm-katib-image with same tag(supported feature in this PR), and see whether all e2e test passed. @tenzen-y

What do you think of it ?

Thanks for pointing out that. @anencore94
It would be great to verify katib components on arm64 env by our e2e test.
Although, it seems that minikube qemu2 driver does not support emulating arm64 when the host machine's CPU architecture is amd64.
So, we need to consider other ways such as the below:

  1. Run our e2e tests on Amazon EKS with AWS Graviton2 EC2 instance.
  2. Run our e2e tests on Github Actions with a self-hosted runner using AWS Graviton2 EC2 instance.
  3. And more...

In any case, we need to touch with the AWS team.
Do you have any other ideas?

@anencore94
Copy link
Member

Thanks for your comments. @anencore94
The goal of #1900 is to be able to build a multi-architecture supported image (support arm64 and amd64 architecture with a single image tag) on gh-actions and a local machine.

I agree with your comment :), Though what I meant was that I think we may need to check arm-image is built "well". For example, by contructing k8s cluster in arm-action-runner and setup katib with all arm-katib-image with same tag(supported feature in this PR), and see whether all e2e test passed. @tenzen-y
What do you think of it ?

Thanks for pointing out that. @anencore94 It would be great to verify katib components on arm64 env by our e2e test. Although, it seems that minikube qemu2 driver does not support emulating arm64 when the host machine's CPU architecture is amd64. So, we need to consider other ways such as the below:

  1. Run our e2e tests on Amazon EKS with AWS Graviton2 EC2 instance.
  2. Run our e2e tests on Github Actions with a self-hosted runner using AWS Graviton2 EC2 instance.
  3. And more...

In any case, we need to touch with the AWS team. Do you have any other ideas?

Thanks for your detailed comments 👍 @tenzen-y
For minikube qemu2 driver does not support emulating arm64 when the host machine's CPU architecture is amd64, I agree, so that's why I think we need to setup arm64-github-action runner. (it is second way of your suggestion)

@tenzen-y
Copy link
Member Author

so that's why I think we need to setup arm64-github-action runner. (it is second way of your suggestion)

I think so too.

What do you think? @kubeflow/wg-automl-leads

@johnugeorge
Copy link
Member

My only concern is that our CI tests will take much longer to test both platforms.

@tenzen-y
Copy link
Member Author

My only concern is that our CI tests will take much longer to test both platforms.

It makes sense. @johnugeorge Deploying multiple VMs might be helpful for us.

Also, we need to build a CI system that deploys an ephemeral clean VM for each job, as described in the following document.

GitHub-hosted runners execute code within ephemeral and clean isolated virtual machines, meaning there is no way to persistently compromise this environment, or otherwise gain access to more information than was placed in this environment during the bootstrap process.

Self-hosted runners for GitHub do not have guarantees around running in ephemeral clean virtual machines, and can be persistently compromised by untrusted code in a workflow.

As a result, self-hosted runners should almost never be used for public repositories on GitHub, because any user can open pull requests against the repository and compromise the environment. Similarly, be cautious when using self-hosted runners on private or internal repositories, as anyone who can fork the repository and open a pull request (generally those with read access to the repository) are able to compromise the self-hosted runner environment, including gaining access to secrets and the GITHUB_TOKEN which, depending on its settings, can grant write access to the repository. Although workflows can control access to environment secrets by using environments and required reviews, these workflows are not run in an isolated environment and are still susceptible to the same risks when run on a self-hosted runner.

https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#hardening-for-self-hosted-runners

@johnugeorge
Copy link
Member

johnugeorge commented Aug 22, 2022

@tenzen-y Since we are planning to use AWS credits, we can create a new issue to track various possible solutions with their required CI changes for each. This will help to estimate efforts for each solution

@tenzen-y
Copy link
Member Author

@tenzen-y Since we are planning to use AWS credits, we can create a new issue to track various possible solutions with their required CI changes for each. This will help to estimate efforts for each solution

Sounds good. @johnugeorge
I will create an issue to keep tracking this.

@anencore94
Copy link
Member

@tenzen-y Since we are planning to use AWS credits, we can create a new issue to track various possible solutions with their required CI changes for each. This will help to estimate efforts for each solution

Sounds good. @johnugeorge I will create an issue to keep tracking this.

Thanks @johnugeorge , @tenzen-y . That would be nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants