-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A KEP for StartupPolicy #244
A KEP for StartupPolicy #244
Conversation
/retest |
d5b2a6e
to
55ecf7d
Compare
/cc @vsoch @Gekko0114 Would this API fit your needs? |
@kannon92: GitHub didn't allow me to request PR reviews from the following users: vsoch, Gekko0114. Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This would definitely be useful for us - what I've had to do is typically start the workers and handle race conditions with sleeps / waiting, that sort of thing. |
And an API like:
Would be sufficent? |
Thanks @kannon92 so much! IMO, I prefer defining startUpPolicy under each replicatedJob field.
FYI
|
I think he's mirroring jobset/api/jobset/v1alpha2/jobset_types.go Line 177 in 61ad4b0
I think my preference is for that modular (and consistent, since we already have SuccessPolicy and FailurePolicy) approach, ensuring that each of those sections can be produced independently and then added to a replicatedJob. And if we wanted to be more consistent it might look like: type StartupPolicy struct {
TargetReplicatedJobs []string `json:"targetReplicatedJobs,omitempty"`
Status JobReadyStatus `json:"status"`
} This makes the assumption that a JobSet will have one condition (set of jobs and a status) to indicate ready. I'm not sure we want to allow more complexity than that (the workflow complexity thing is a non goal). |
55ecf7d
to
661f1bb
Compare
@vsoch is correct. I was aiming to replicate a similar behavior that we have for
Yea, this is called out as a non-goal as I don't think this API will ever be complete in the context of workflows. |
Correct.
Yea I believe there was an interest in both I ended up going with the API I have because of the flexibility of assuming ready or succeeded for the replicated jobs. I do acknowledge the irony of calling out that we don't want a workflow engine but then we at least have a poor person workflow engine with this. |
I commented without understanding the context of jobset, thank you for your explanations |
keps/104-StartupPolicy/README.md
Outdated
When a JobSet is started with `StartupPolicy` specified, we will create jobs in a suspended state (ie `Job.Spec.Suspend = true`) This avoids starting the underlying jobs. | ||
|
||
If StartupPolicy is set, then we will suspend the other jobs until the StartupPolicy is considered succesful. | ||
We will unsuspend only the jobs that match `TargetReplicatedJobStartup`. Once they are considered "started" by matching on targeted status, then we resume all the rest of the jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we unsuspend jobs matching TargetReplicatedJobStartup
, then in the example where we want sequential execution of Job A -> Job B -> Job -> C, wouldn't they all run in parallel since there is a separate startup policy targeting each of them?
Or is this saying that we handle only 1 startup policy at a time, in the order in which they are specified in the spec (e.g. we first see a startup policy for Job A so we unsuspend it, wait for Job A to complete, then we check the next startup policy and it's for Job B, so we unsuspend that, etc.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is this saying that we handle only 1 startup policy at a time, in the order in which they are specified in the spec (e.g. we first see a startup policy for Job A so we unsuspend it, wait for Job A to complete, then we check the next startup policy and it's for Job B, so we unsuspend that, etc.)?
This is where I find it difficult balancing design versus implementation details.
In the implementation, I apply StartupPolicy in sequential order.
I would verify if JobA is suspended and then unsuspend. I would return and trigger a reconcile.
I loop over the startupPolicy rules and check if JobA is successful via the rules (Ready/Succeeded) and if it is, I move on to JobB and resume if its suspended.
And repeat.
If all the startup polices items are successful then we unsuspend everything not in the startup list (because the startup ones are running or succeeded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated with this info. PTAL
c3a70cd
to
84235b3
Compare
cb80a39
to
9d855b9
Compare
9d855b9
to
c03cf4d
Compare
/retest |
c03cf4d
to
4f8a19e
Compare
@kannon92 this is great, can you please resolve the remaining comments and get this merged? |
4f8a19e
to
bbd6573
Compare
A couple final nits, otherwise looks good to me. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, kannon92 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
Thanks for driving this @kannon92! 🥳 |
I wanted to draft a potential API for StartupPolicy as we have discussed it.