-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit the number of goroutines used when deleting jobs #123
Limit the number of goroutines used when deleting jobs #123
Conversation
// where the jobs are recreated. | ||
backgroundPolicy := metav1.DeletePropagationBackground | ||
if err := r.Delete(ctx, job, &client.DeleteOptions{PropagationPolicy: &backgroundPolicy}); client.IgnoreNotFound(err) != nil { | ||
finalErr = err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a potential issue, a race condition may occur.
db70f73
to
0276be0
Compare
@@ -364,31 +369,23 @@ func (r *JobSetReconciler) restartPolicyRecreateAll(ctx context.Context, js *job | |||
return nil | |||
} | |||
|
|||
func (r *JobSetReconciler) deleteJobs(ctx context.Context, js *jobset.JobSet, jobsForDeletion []*batchv1.Job) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't use the JobSet
.
pkg/controllers/jobset_controller.go
Outdated
if err := r.Delete(ctx, targetJob, &client.DeleteOptions{PropagationPolicy: &backgroundPolicy}); client.IgnoreNotFound(err) != nil { | ||
lock.Lock() | ||
defer lock.Unlock() | ||
finalErr = errors.Join(finalErr, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, errors.Join()
is introduced since Go 1.20, although this project uses Go 1.19.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahg-g @danielvegamyhre I'd like to update the Go version to 1.20 in another PR before merging this one into the main.
Does bumping the Go version to v1.20 sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me, let's wait for Abdullah to confirm as well though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say that I remember upgrading the e2e tests in test infra to 1.20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that I think we are already using a go1.20 for the e2e tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right :)
https://github.com/kubernetes/test-infra/blob/0e54866edb5d54816d2593bdfcd686183b4ea08e/config/jobs/kubernetes-sigs/jobset/jobset-presubmit.yaml#L13
Then, we just bump it in go.mod!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open a PR bumping that and we can discuss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
72ba788
to
f49cbc6
Compare
Squashed into one and rebased. |
/lgtm Leaving approval for Abdullah's review |
pkg/controllers/jobset_controller.go
Outdated
const RestartsKey string = "jobset.sigs.k8s.io/restart-attempt" | ||
const ( | ||
RestartsKey string = "jobset.sigs.k8s.io/restart-attempt" | ||
parallelDeletions int = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this too low? any idea what do we set this number to in other places like the Job API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this too low?
It makes sense. But I'm not sure appropriate value. How about 100 as a default value? Or, do you have any good ideas?
any idea what do we set this number to in other places like the Job API?
How about setting it via config: #55?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the concept of a "slow start" in the linked code from the job controller. Maybe it would be simpler to start with some constant value like 100 then implement slow start separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for deletions we can continue to use a constant (I would set it to 50); but adopt a slow start approach for parallel creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for deletions we can continue to use a constant (I would set it to 50); but adopt a slow start approach for parallel creation.
Great suggestions! I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Yuki, I think we need to parallelize job creation too, I created an issue: #130
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
f49cbc6
to
b425f26
Compare
@ahg-g @danielvegamyhre I have updated. |
/test pull-jobset-test-e2e-main-1-24 |
/lgtm |
any idea why this failed? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Actually, I created #135. |
I limited the number of goroutines used when deleting Jobs.