-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconcile trial assignments by comparing suggestion and trials being executed #1831
Reconcile trial assignments by comparing suggestion and trials being executed #1831
Conversation
Hi @henrysecond1. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
what is the expected behaviour if trials are deleted manually ? Recreate trial? |
Thank you for checking!
Yes I think trials should be recreated. Otherwise, trials are not recreated and also the experiment can not be completed |
I also agree with @henrysecond1 :) Such situations could happen such as some node crashed on which some trials are scheduled. Trials-controller should be resilient for that case. |
Just to highlight, we are assuming that trial names are consistently returned by the suggestion algorithms. I think, it is true currently as names are generated by suggestion algorithms. Is it a valid assumption for all algorithms ? |
@anencore94 Node crashes are automatically handled as pods will get rescheduled. Did you see different behaviour? |
@henrysecond1 Can you add a unit test as well? |
/ok-to-test |
Sure, I added a unit test for |
/retest |
@tenzen-y Can you also review this PR ? |
Sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving the experiment controller algorithm @henrysecond1 !
/lgtm
Thanks @henrysecond1 |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: henrysecond1, johnugeorge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Please see #1830
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #1830
Checklist: