Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the CI to build multi-platform container images #1956

Merged

Conversation

tenzen-y
Copy link
Member

@tenzen-y tenzen-y commented Sep 17, 2022

Signed-off-by: tenzen-y yuki.iwai.tz@gmail.com

What this PR does / why we need it:
I added the CI to build multi-platform container images using composite action and reusable workflows.
Also, I cleaned up config files for actions according to the below documents.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1900

Checklist:

  • Docs included if any changes are user facing

@coveralls
Copy link

coveralls commented Sep 17, 2022

Coverage Status

Coverage decreased (-0.1%) to 73.431% when pulling f150aad on tenzen-y:add-ci-to-build-multi-arch-image into e02eb6e on kubeflow:master.

@tenzen-y tenzen-y force-pushed the add-ci-to-build-multi-arch-image branch from caa7b8b to e0f7eeb Compare September 18, 2022 05:40
@tenzen-y tenzen-y changed the title Add the CI to build multi-platform container images [WIP] Add the CI to build multi-platform container images Sep 18, 2022
@tenzen-y tenzen-y force-pushed the add-ci-to-build-multi-arch-image branch 2 times, most recently from 43dc46c to 3bd4764 Compare September 18, 2022 14:34
dockerfile: examples/v1beta1/trial-images/pytorch-mnist/Dockerfile.cpu
- trial-name: pytorch-mnist-gpu
platforms: linux/amd64
Copy link
Member Author

@tenzen-y tenzen-y Sep 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we face the below error, we can not build multiplatform container images with GPU support.
Once we use AWS self-hosted runner, we will be able to build it.

System.IO.IOException: No space left on device : '/home/runner/runners/2.296.2/_diag/Worker_20220918-065331-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.296.2/_diag/Worker_20220918-065331-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Common.Tracing.Error(Exception exception)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.296.2/_diag/Worker_20220918-065331-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at System.Diagnostics.TraceSource.Flush()
   at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
   at GitHub.Runner.Common.TraceManager.Dispose()
   at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
   at GitHub.Runner.Common.HostContext.Dispose()
   at GitHub.Runner.Worker.Program.Main(String[] args)

https://github.com/kubeflow/katib/actions/runs/3075994787

type: boolean
default: false
description: whether to deploy training-operator or not
default: "false"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


inputs:
experiments:
required: true
type: string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


inputs:
experiments:
required: true
type: string
description: comma delimited experiment name
Copy link
Member Author

@tenzen-y tenzen-y Sep 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenzen-y tenzen-y changed the title [WIP] Add the CI to build multi-platform container images Add the CI to build multi-platform container images Sep 18, 2022
@tenzen-y tenzen-y force-pushed the add-ci-to-build-multi-arch-image branch from 3bd4764 to 987a447 Compare September 18, 2022 19:46
@johnugeorge
Copy link
Member

How long does it take in CI now ?

@tenzen-y
Copy link
Member Author

How long does it take in CI now ?

@johnugeorge Maybe, over an hour but I'm not sure about the accurate duration. Would you like to see how long it will take by restarting all jobs?

Signed-off-by: tenzen-y <yuki.iwai.tz@gmail.com>
@tenzen-y tenzen-y force-pushed the add-ci-to-build-multi-arch-image branch from 987a447 to f150aad Compare September 22, 2022 13:37
@tenzen-y
Copy link
Member Author

tenzen-y commented Sep 22, 2022

I have pushed to kick the CI.

@tenzen-y
Copy link
Member Author

tenzen-y commented Sep 22, 2022

@johnugeorge I have checked what it spends about 2 hours for CI since this PR.

So, It might be better to introduce the order to run jobs using needs in another PR so that we can run efficiently jobs.

For example, Lint test -> Unit test -> Build test -> Integration Test.

What do you think?

@johnugeorge
Copy link
Member

@tenzen-y How does it help?

@tenzen-y
Copy link
Member Author

tenzen-y commented Sep 22, 2022

@tenzen-y How does it help?

@johnugeorge
Since even in the case of new commits with basic errors such as lint or unit tests, we run all jobs, including integration tests that take a long time, we face a limitation of the number of actions jobs for the katib repo, and then we can not run jobs in other PRs.

Introducing needs, in the case of above, as we don't run jobs for integration tests, we can save the number of job executions.

@johnugeorge
Copy link
Member

Sure. Do you want to do it in a separate PR?

@tenzen-y
Copy link
Member Author

Sure. Do you want to do it in a separate PR?

Yes, I would like to create a separate PR to resolve it since I created this PR to resolve #1900.

@johnugeorge
Copy link
Member

Ready to merge this?

@tenzen-y
Copy link
Member Author

Ready to merge this?

Yes. Let's merge this!

@johnugeorge
Copy link
Member

/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit f5e4586 into kubeflow:master Sep 23, 2022
@tenzen-y tenzen-y deleted the add-ci-to-build-multi-arch-image branch September 23, 2022 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support multi CPU architectures (amd64 and arm64) in all container images with one image tag
3 participants