Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel job creation #427

Merged
merged 1 commit into from
Nov 19, 2024
Merged

Parallel job creation #427

merged 1 commit into from
Nov 19, 2024

Conversation

DrJosh9000
Copy link
Contributor

@DrJosh9000 DrJosh9000 commented Nov 19, 2024

What

For each poll of BK for jobs, run the rest of the pipeline (deduper -> limiter -> scheduler) in parallel (with a fairly conservative default concurrency of 5 goroutines).
Randomise the order in which jobs are sent.

Why

If 100% of jobs sent to k8s successfully create, then (up to the MaxInFlight limit if it applies) the current process should always make progress even if the k8s API was slow, because staleCtx only interrupts waiting for limiter tokens, not the creation of jobs. (Any job that makes its way through the limiter will be sent to k8s by the scheduler.) The other bits in the middle shouldn't be so slow that staleCtx is cancelled first.

If some jobs successfully create and some won't, then that's most likely a property of the job (e.g. the job name collides with an existing job). If we try and fail to create the same job over and over, and creating jobs is slow enough that we don't get through enough of the broken ones before staleCtx expires, we won't progress.

Randomising the order of creating jobs means jobs that could successfully be created get a chance. Adding parallelism to create more jobs at a time will increase load on the k8s cluster, but let more jobs be tried before staleness kicks in.

@DrJosh9000 DrJosh9000 force-pushed the parallel-job-creation branch 2 times, most recently from c1b1848 to 33e629b Compare November 19, 2024 02:22
@DrJosh9000 DrJosh9000 marked this pull request as ready for review November 19, 2024 02:22
Copy link
Contributor

@wolfeidau wolfeidau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nifty error switch 🤔 👍🏻 🚀

@DrJosh9000 DrJosh9000 merged commit 8c02912 into main Nov 19, 2024
1 check passed
@DrJosh9000 DrJosh9000 deleted the parallel-job-creation branch November 19, 2024 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants