-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix infinite loop in init-pytorch container #1756
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kidddddddddddddddddddddd Thanks for your contibution!
@kubeflow/wg-training-leads Can you approve CI?
@@ -43,7 +43,7 @@ var ( | |||
requests: | |||
cpu: 50m | |||
memory: 10Mi | |||
command: ['sh', '-c', 'until nslookup {{.MasterAddr}}; do echo waiting for master; sleep 2; done;']` | |||
command: ['sh', '-c', 'err=1;for i in $(seq 100); do if nslookup {{.MasterAddr}}; then err=0 && break; fi;echo waiting for master; sleep 2; done; exit $err']` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason we set the limit at 100 times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this as timeout which is configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnugeorge Do you mean like {{.MasterAddr}}
, set a helm value like for i in $(seq {{ .MaxReries }});
? Where should I set the default value of this key? I can't seem to find the values file for helm🤦.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Test Coverage Report for Build 4145313973
💛 - Coveralls |
Pull Request Test Coverage Report for Build 4171231098
💛 - Coveralls |
2a6b6aa
to
ae2ac0e
Compare
/lgtm /cc @tenzen-y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kidddddddddddddddddddddd Thanks for your great contribution!
/lgtm
/assign @johnugeorge
@@ -34,6 +34,7 @@ func TestInitContainer(t *testing.T) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to test the whole of initContainer. However, that is out of the scope of this PR.
So we can follow up with another PR.
@johnugeorge friendly ping. |
Sorry for late response |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge, kidddddddddddddddddddddd The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Replace infinite loop in
init-pytorch
container with a finite loop.Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #1734
Checklist: