-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PodGroup as controller watch source #1666
Add PodGroup as controller watch source #1666
Conversation
I used to thought it should be already fixed. /lgtm /cc @zw0610 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe other job controllers(pytorch/tf/...) also need this?
There seems an error. How to install the PodGroup CRD in the test environment? |
Pull Request Test Coverage Report for Build 3271153965
💛 - Coveralls |
The common.ReconcileJobs stops creating pods when the related PodGroup is unschedulable. When the PodGroup becomes schedulable, the reconcile loop can not be triggered because of no watch source for the PodGroup. Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
You can try to invoke this script locally for debugging |
Continue without watching the PodGroup is the PodGroup is not installed. Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
It's not easy to manage an external crd. I skip the watching if the crd is not found. |
Need a review @shinytang6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, these changes are ok to me.
_, err = mgr.GetRESTMapper().RESTMapping(schema.GroupKind{Group: v1beta1.SchemeGroupVersion.Group, Kind: "PodGroup"}, | ||
v1beta1.SchemeGroupVersion.Version) | ||
if err == nil { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra blank line
_, err = mgr.GetRESTMapper().RESTMapping(schema.GroupKind{Group: v1beta1.SchemeGroupVersion.Group, Kind: "PodGroup"}, | ||
v1beta1.SchemeGroupVersion.Version) | ||
if err == nil { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
done @shinytang6 |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ggaaooppeenngg, johnugeorge The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
* Add PodGroup as controller watch source The common.ReconcileJobs stops creating pods when the related PodGroup is unschedulable. When the PodGroup becomes schedulable, the reconcile loop can not be triggered because of no watch source for the PodGroup. Signed-off-by: Peng Gao <peng.gao.dut@gmail.com> * Fix the no PodGroup kind error Continue without watching the PodGroup is the PodGroup is not installed. Signed-off-by: Peng Gao <peng.gao.dut@gmail.com> * Remove the PodGroup crd Signed-off-by: Peng Gao <peng.gao.dut@gmail.com> * Remove extra blank lines Signed-off-by: Peng Gao <peng.gao.dut@gmail.com> Signed-off-by: Peng Gao <peng.gao.dut@gmail.com> (cherry picked from commit ab9f3ec)
panic happend E1031 12:09:36.449193 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) |
Can you create a separate issue with more details? |
The common.ReconcileJobs stops creating pods when the related PodGroup is unschedulable. When the PodGroup becomes schedulable, the reconcile loop can not be triggered because of no watch source for the PodGroup.
Signed-off-by: Peng Gao peng.gao.dut@gmail.com