Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add terminalState to jobset status #594

Merged
merged 6 commits into from
Jun 30, 2024

Conversation

googs1025
Copy link
Member

Fix: #545

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 5, 2024
@k8s-ci-robot k8s-ci-robot requested review from ahg-g and kannon92 June 5, 2024 09:33
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 5, 2024
Copy link

netlify bot commented Jun 5, 2024

Deploy Preview for kubernetes-sigs-jobset canceled.

Name Link
🔨 Latest commit a1c05dc
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-jobset/deploys/66813df2f7c0fc0008963707

@googs1025 googs1025 force-pushed the add_jobset_status branch 2 times, most recently from 6e304ec to 5397dc9 Compare June 5, 2024 11:05
@googs1025
Copy link
Member Author

googs1025 commented Jun 5, 2024

root@VM-0-9-ubuntu:/home/ubuntu# kubectl get jobset
NAME                   PHASE       RESTARTS   COMPLETED   SUSPENDED   AGE
max-restarts           Failed      3                                  44m
paralleljobs           Completed              True                    44m
startup-driver-ready   Completed              True                    29m
success-policy         Completed              True                    44m

@googs1025 googs1025 force-pushed the add_jobset_status branch from 5397dc9 to 18c845f Compare June 5, 2024 12:01
api/jobset/v1alpha2/jobset_types.go Outdated Show resolved Hide resolved
pkg/controllers/jobset_controller.go Outdated Show resolved Hide resolved
api/jobset/v1alpha2/jobset_types.go Outdated Show resolved Hide resolved
@googs1025 googs1025 force-pushed the add_jobset_status branch 2 times, most recently from 7c00e55 to 17ac525 Compare June 6, 2024 02:14
@googs1025
Copy link
Member Author

        // JobSetRunning means the job is running.
	JobSetRunning JobSetConditionType = "Running"
	// JobSetCompleted means the job has completed its execution.
	JobSetCompleted JobSetConditionType = "Completed"
	// JobSetFailed means the job has failed its execution.
	JobSetFailed JobSetConditionType = "Failed"
	// JobSetSuspended means the job is suspended.
	JobSetSuspended JobSetConditionType = "Suspended"

The phase is converted according to these four fields. The default is Running

@googs1025 googs1025 changed the title [WIP] feat: add jobset status feat: add jobset status Jun 6, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 6, 2024
@googs1025 googs1025 requested a review from ahg-g June 6, 2024 02:37
@googs1025
Copy link
Member Author

/assign @danielvegamyhre
Please also help review, thank you!

feat: add jobset status
@googs1025 googs1025 force-pushed the add_jobset_status branch from 17ac525 to 25396be Compare June 6, 2024 09:57
@danielvegamyhre danielvegamyhre changed the title feat: add jobset status feat: add phase to jobset status Jun 6, 2024
@danielvegamyhre danielvegamyhre added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 6, 2024
@googs1025 googs1025 requested a review from danielvegamyhre June 7, 2024 00:46
@googs1025 googs1025 force-pushed the add_jobset_status branch from 308e200 to 9e78074 Compare June 7, 2024 04:02
@shrinandj
Copy link

Thanks for doing this @googs1025. I will test this out..

@@ -134,6 +136,10 @@ type JobSetStatus struct {
// RestartsCountTowardsMax tracks the number of times the JobSet has restarted that counts towards the maximum allowed number of restarts.
RestartsCountTowardsMax int32 `json:"restartsCountTowardsMax,omitempty"`

// Phase of the JobSet.
// +kubebuilder:default="Running"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be in the status, not the spec, and we can't default it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is set in jobs status

// JobSetStatus defines the observed state of JobSet
type JobSetStatus struct {
	// +optional
	// +listType=map
	// +listMapKey=type
	Conditions []metav1.Condition `json:"conditions,omitempty"`

         ....
	// Phase of the JobSet.
	// +kubebuilder:default="Running"
	Phase string `json:"phase,omitempty"`

	
}

@@ -867,14 +867,15 @@ func enqueueEvent(updateStatusOpts *statusUpdateOpts, event *eventParams) {
// function parameters for setCondition
type conditionOpts struct {
eventType string
phase string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to pass this around, we should be able to compute the phase from the condition in the status update function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only use condition calculation, we may need to exclude other JobSetConditionType. I think it would be more convenient to use one more field such as phase or terminalState.

such as: These conditions are not what this field needs to care about

// These are built-in conditions of a JobSet.
const (
	...
	// JobSetSuspended means the job is suspended.
	JobSetSuspended JobSetConditionType = "Suspended"
	// JobSetStartupPolicyInProgress means the StartupPolicy is in progress.
	JobSetStartupPolicyInProgress JobSetConditionType = "StartupPolicyInProgress"
	// JobSetStartupPolicyCompleted means the StartupPolicy has completed.
	JobSetStartupPolicyCompleted JobSetConditionType = "StartupPolicyCompleted"
)

api/jobset/v1alpha2/jobset_types.go Outdated Show resolved Hide resolved
@shrinandj
Copy link

... TerminalState: success|failed, if more details are needed, then one needs to look that up in the conditions.

This would work perfectly..

@shrinandj
Copy link

I was able to do the following testing on this PR:

  • Built a docker image with the code in this PR and pushed it a local docker registry
  • Installed the new CRD and the new controller. The controller came up fine
  • Create various types of jobsets that would succeed/fail. Verified that the status.phase field in the jobset object was set correctly to Completed or Failed
  • Also liked the Phase field being in the output of kubectl get jobset

@danielvegamyhre
Copy link
Contributor

I was able to do the following testing on this PR:

  • Built a docker image with the code in this PR and pushed it a local docker registry
  • Installed the new CRD and the new controller. The controller came up fine
  • Create various types of jobsets that would succeed/fail. Verified that the status.phase field in the jobset object was set correctly to Completed or Failed
  • Also liked the Phase field being in the output of kubectl get jobset

This is great! Thanks for testing this.

@googs1025
Copy link
Member Author

googs1025 commented Jun 18, 2024

I was able to do the following testing on this PR:

  • Built a docker image with the code in this PR and pushed it a local docker registry
  • Installed the new CRD and the new controller. The controller came up fine
  • Create various types of jobsets that would succeed/fail. Verified that the status.phase field in the jobset object was set correctly to Completed or Failed
  • Also liked the Phase field being in the output of kubectl get jobset

@shrinandj Thanks for your test
Later, it will be changed to the TerminalState field and will only have complated and failed, and the others will be empty by default.
I will revise this PR in the next few days. Thank all of you for suggestions.

@googs1025 googs1025 changed the title feat: add phase to jobset status feat: add terminalState to jobset status Jun 20, 2024
@googs1025
Copy link
Member Author

I changed the field to terminalState. When the jobset reaches the final state, it will be set to Completed or Failed.

root@VM-0-16-ubuntu:/home/ubuntu# kubectl get jobset
NAME             TERMINALSTATE   RESTARTS   COMPLETED   SUSPENDED   AGE
max-restarts     Failed          3                                  89m
paralleljobs     Completed                  True                    89m
success-policy   Completed                  True                    89m

@googs1025
Copy link
Member Author

googs1025 commented Jun 20, 2024

@ahg-g @danielvegamyhre /PTAL Thanks for review

Signed-off-by: googs1025 <googs1025@gmail.com>
@googs1025 googs1025 force-pushed the add_jobset_status branch from 95adaea to 4137a08 Compare June 20, 2024 15:42
@shrinandj
Copy link

Could this be merged and new release be cut with this feature?

@danielvegamyhre
Copy link
Contributor

Could this be merged and new release be cut with this feature?

Yep, will do final review shortly

@@ -941,6 +944,13 @@ func updateCondition(js *jobset.JobSet, opts *conditionOpts) bool {
js.Status.Conditions = append(js.Status.Conditions, newCond)
shouldUpdate = true
}

// If the jobset is in a terminal state, set the terminal state on the jobset.
if opts.terminalState != "" && js.Status.TerminalState != opts.terminalState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose to put the logic updating the terminal state in the function that is responsible for updating the condition, rather than put it in its own function? The logic doesn't seem to depend on the condition updates here, it just depends on the opts passed in, so I don't see why this logic belongs here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was that the TerminalState field is actually a subtype of the Conditions field, specifically the JobSetCompleted or JobSetFailed types. TerminalState can be seen as a special case within Conditions. That's why I grouped these two together and updated them in the same method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If separating them, we can distinguish them into updateCondition and updateTerminalState methods.

// setCondition will add a new condition and terminalState to the JobSet status (or update an existing one),
// and enqueue an event for emission if the status update succeeds at the end of the reconcile.
func setCondition(js *jobset.JobSet, condOpts *conditionOpts, updateStatusOpts *statusUpdateOpts) {
	// Return early if no status update is required for this condition and terminalState.
	if !updateCondition(js, condOpts) && !updateTerminalState(js, condOpts) {
		return
	}

	if updateStatusOpts == nil {
		updateStatusOpts = &statusUpdateOpts{}
	}

	...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to set status.TerminalState in the same functions where we set the completed or failed condition. Meaning in setJobSetFailedCondition and setJobSetCompletedCondition

func setJobSetCompletedCondition(js *jobset.JobSet, updateStatusOpts *statusUpdateOpts) {
	setCondition(js, makeCompletedConditionsOpts(), updateStatusOpts)
        jobset.Status.TerminalState = JobSetCompleted
}
func setJobSetFailedCondition(ctx context.Context, js *jobset.JobSet, reason, msg string, updateStatusOpts *statusUpdateOpts) {
	setCondition(js, makeFailedConditionOpts(reason, msg), updateStatusOpts)
        jobset.Status.TerminalState = JobSetFailed
}

and that is all, we don't need to track anything in updateStatusOpts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, this makes the code much simpler! done

Copy link
Contributor

@ahg-g ahg-g Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a checkJobSetTerminalState function to https://github.com/kubernetes-sigs/jobset/blob/main/test/util/util.go :

func checkJobSetTerminalState(ctx context.Context, k8sClient client.Client, js *jobset.JobSet, state string) (bool, error) {
	var fetchedJS jobset.JobSet
	if err := k8sClient.Get(ctx, types.NamespacedName{Namespace: js.Namespace, Name: js.Name}, &fetchedJS); err != nil {
		return false, err
	}
        return fetchedJS.Status.TerminalState == state
}

and call it from JobSetCompleted to and JobSetFailed to verify the terminalState during integration tests.

Copy link
Member Author

@googs1025 googs1025 Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,i will do it soon

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ahg-g
Copy link
Contributor

ahg-g commented Jun 30, 2024

/lgtm
/approve
/label tide/merge-method-squash

This is great, thank you for your patience during the review process!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 30, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, googs1025

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 30, 2024
@k8s-ci-robot k8s-ci-robot merged commit c450428 into kubernetes-sigs:main Jun 30, 2024
13 checks passed
@tenzen-y
Copy link
Member

tenzen-y commented Jul 11, 2024

Thank you for moving this forward!

@danielvegamyhre danielvegamyhre mentioned this pull request Jul 14, 2024
13 tasks
@danielvegamyhre danielvegamyhre mentioned this pull request Aug 19, 2024
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simpler top level status fields for total, succeeded and failed jobs
7 participants