-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OnDependentUpdateFunc for Job will panic when enable volcano scheduler #1678
Comments
not only tf, pytorch、mxnet... met the same problem : |
which version of training operator are you using? |
k8s version: 1.20.15 |
Seems like, bug got introduced as part of #1666 /cc @ggaaooppeenngg |
I only tested it with MPIJob. I made a temporary fix for this issue. @D0m021ng @johnugeorge |
@ggaaooppeenngg How did it work for MPIJob? Can you add a test case as well? |
@johnugeorge Only MPIJob controller uses OnDependentXXXFuncGeneric which initializes the generic logger at the beginning. I am trying to commit to a test for it. |
I see that pod watch uses OnDependentXXXFunc
|
@johnugeorge It is for the pod. For other objects, it uses the generic function.
|
OnDependentUpdateFunc
for TF Job will panic when enable volcano, because of job controller inject watching for job related podGroup.OnDependentUpdateFunc
logger might be null when newObj is
v1beta1.PodGroup
TF Job controller
panic
The text was updated successfully, but these errors were encountered: