Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

publish torchrun example via Dockerfile #2018

Merged
merged 1 commit into from
Mar 8, 2024

Conversation

PeterWrighten
Copy link
Contributor

@PeterWrighten PeterWrighten commented Mar 8, 2024

Signed-off-by: PeterWright peterwrighten@gmail.com

What this PR does / why we need it:

In order to close #2017

Publish torchrun example via DockerRegistry

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2017

Checklist:

  • Docs included if any changes are user facing

Copy link

google-cla bot commented Mar 8, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Signed-off-by: PeterWrighten <peterwrighten@gmail.com>
Copy link

@PeterWrighten: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tenzen-y
Copy link
Member

tenzen-y commented Mar 8, 2024

@kubeflow/wg-training-leads Could you approve CI?

Copy link
Member

@Jeffwan Jeffwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jeffwan, PeterWrighten

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tenzen-y
Copy link
Member

tenzen-y commented Mar 8, 2024

/hold

@google-oss-prow google-oss-prow bot merged commit 6133600 into kubeflow:master Mar 8, 2024
3 checks passed
@tenzen-y
Copy link
Member

tenzen-y commented Mar 8, 2024

I wanted to verify it in CI before merging...

@tenzen-y
Copy link
Member

tenzen-y commented Mar 8, 2024

@Jeffwan
Copy link
Member

Jeffwan commented Mar 8, 2024

em. I notice it's a simple change and file path are all correct.

@Jeffwan
Copy link
Member

Jeffwan commented Mar 8, 2024

@tenzen-y

seems it copy files from current folder instead of the root folder. using examples/pytorch/cpu-demo/demo.py should resolve the issues. Let's cut a separate PR to fix it..

@tenzen-y
Copy link
Member

tenzen-y commented Mar 8, 2024

@tenzen-y

seems it copy files from current folder instead of the root folder. using examples/pytorch/cpu-demo/demo.py should resolve the issues. Let's cut a separate PR to fix it..

I guess that we need to specify context like this:

context: examples/xgboost/xgboost-dist

@Jeffwan
Copy link
Member

Jeffwan commented Mar 9, 2024

@tenzen-y Yeah, changing the context works as well.

@PeterWrighten
Copy link
Contributor Author

PeterWrighten commented Mar 9, 2024

@tenzen-y Yeah, changing the context works as well.

I would change the context later. BTW, it seems that this PR has been merged automatically, should I pull a new one? @tenzen-y @Jeffwan

@PeterWrighten PeterWrighten deleted the publish-torchrun branch March 9, 2024 03:17
tedhtchang pushed a commit to tedhtchang/training-operator that referenced this pull request Apr 5, 2024
Signed-off-by: PeterWrighten <peterwrighten@gmail.com>
(cherry picked from commit 6133600)
deepanker13 pushed a commit to deepanker13/deepanker-training-operator that referenced this pull request Apr 8, 2024
Signed-off-by: PeterWrighten <peterwrighten@gmail.com>
Signed-off-by: deepanker13 <deepanker.gupta@nutanix.com>
johnugeorge pushed a commit to johnugeorge/training-operator that referenced this pull request Apr 28, 2024
Signed-off-by: PeterWrighten <peterwrighten@gmail.com>
johnugeorge pushed a commit to johnugeorge/training-operator that referenced this pull request Apr 28, 2024
Signed-off-by: PeterWrighten <peterwrighten@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Publish torchrun example via DockerRegistry
3 participants