studyJob cannot recover once Completed or Failed #291

hougangliu · 2018-12-13T06:01:24Z

Now studyJob cannot recover once Completed or Failed.
When a studyJob CRD created, I can update it by "kubectl apply" or else, but if the studyJob condition is Completed or Failed, we never start next suggestion schedule.
For example, a user creates a studyJob with an invalid workerSpec, which would lead to spawnWorker return error and studyJob goes to Failed. Then the user can correct workerSpec by "kube apply", expecting studyJob re-triggered. However, nothing will happen.

@here we should discuss the behavior of updating a studyJob CRD (when it is in RUNNING, Completed or FAILED)

	if instance.Status.Condition == katibv1alpha1.ConditionCompleted || instance.Status.Condition == katibv1alpha1.ConditionFailed {
		nextSuggestionSchedule = false
	}

The text was updated successfully, but these errors were encountered:

hougangliu · 2018-12-15T16:11:00Z

/help

hougangliu · 2018-12-20T00:36:12Z

/remove-help

hougangliu · 2018-12-20T00:37:36Z

@YujiOshima can you add label community/discussion for it

hougangliu · 2019-01-23T05:35:53Z

At least, for Failed studyJob, we should try to rehandle it when updated.
For completed studyJob, we should reject updating it by webhook(we need upgrade controller-runtime, and webhook can validate studyJob to fix studyJob controller is blocked by bad CR manifests #314).

jlewi · 2019-03-10T23:06:46Z

This seems like its working as intended to me.

Once a job reaches a terminal state (failed or succeseded) updates to the job should not be allowed.
This is consistent with how native K8s jobs work.

If a user wants to update the spec they could create a new job.

/cc @johnugeorge @richardsliu

johnugeorge · 2019-03-22T08:26:33Z

Agree. K8s job works in the same way.

gaocegege · 2019-10-10T09:07:38Z

/close

We deprecated v1alpha1.

k8s-ci-robot · 2019-10-10T09:07:39Z

@gaocegege: Closing this issue.

In response to this:

/close

We deprecated v1alpha1.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the help wanted Extra attention is needed label Dec 15, 2018

k8s-ci-robot removed the help wanted Extra attention is needed label Dec 20, 2018

richardsliu added the kind/discussion label Jan 18, 2019

richardsliu added the area/0.5.0 label Jan 31, 2019

hougangliu mentioned this issue Jan 31, 2019

Extending StudyJob API by adding more trials to finished StudyJob #346

Closed

YujiOshima mentioned this issue Feb 13, 2019

StudyJob v1alpha2 API version #370

Closed

15 tasks

jlewi removed the area/0.5.0 label Mar 10, 2019

k8s-ci-robot closed this as completed Oct 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

studyJob cannot recover once Completed or Failed #291

studyJob cannot recover once Completed or Failed #291

hougangliu commented Dec 13, 2018

hougangliu commented Dec 15, 2018

hougangliu commented Dec 20, 2018

hougangliu commented Dec 20, 2018

hougangliu commented Jan 23, 2019 •

edited

Loading

jlewi commented Mar 10, 2019

johnugeorge commented Mar 22, 2019

gaocegege commented Oct 10, 2019

k8s-ci-robot commented Oct 10, 2019

studyJob cannot recover once Completed or Failed #291

studyJob cannot recover once Completed or Failed #291

Comments

hougangliu commented Dec 13, 2018

hougangliu commented Dec 15, 2018

hougangliu commented Dec 20, 2018

hougangliu commented Dec 20, 2018

hougangliu commented Jan 23, 2019 • edited Loading

jlewi commented Mar 10, 2019

johnugeorge commented Mar 22, 2019

gaocegege commented Oct 10, 2019

k8s-ci-robot commented Oct 10, 2019

hougangliu commented Jan 23, 2019 •

edited

Loading