-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add execution concurrency #5659
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #5659 +/- ##
===========================================
+ Coverage 9.74% 37.04% +27.30%
===========================================
Files 214 1316 +1102
Lines 39190 132247 +93057
===========================================
+ Hits 3820 48995 +45175
- Misses 35031 78989 +43958
- Partials 339 4263 +3924
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
- created_at | ||
|
||
#### Open Questions | ||
- Should we always attempt to schedule pending executions in ascending order of creation time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make it configurable? FIFO, FILO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I wasn't sure! any suggestions here? we could introduce an enum and choose fifo to begin with and expand support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed thoughts on making the queue's order of execution configurable.
If we support a limited number of parallel executions (more than 1), the order of these executions would naturally start as FIFO up until that limit is reached.
To me, providing an option to begin executing FILO after that limit is reached feels confusing to me.
However, that brings a different question to mind: If multiple workflows are queued up, should we provide an option to enable loud notifications?
In other words, if backlogged executions have the possibility of impacting downstream operations, can we enable users to receive loud notifications, including the number of queued executions?
I can imagine a use case where: holiday shopping -> increased purchase volume -> increased data size -> multiple, consecutive execution delays -> cascading backlog of executions. In this scenario, the owners of the workflow may be out on leave and not be aware of the growing backlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, we have workflow notifications enabled for terminal state but we've talked more about richer, customizable notifications and I think this slates neatly into that
I think for a v1 having the default behavior be fifo with an extended description/explanation for the pending state may provide some visibility here to start off with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this suggestion of having an enum listing the policies to the Implementation details section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can add a customers feedback here, where the desired behaviour is to actually replace (terminate) the current executions by subsequent executions. Sounds like too much for the initial scope but still interested if this would be possible to add later with the current approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fiedlerNr9 added a section under Alternatives. I don't think this is precluded by this implementation but not in scope for this proposal atm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good. I'd feel more comfortable if we fleshed out the implementation a bit more, but otherwise, I feel like we're on the same page.
|
||
``` | ||
|
||
### FlyteAdmin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During last week's contributors meeting someone asked a question about having this concurrency control work across versions. Can we either have a discussion in this PR about it or list that use case as not being supported explicitly in the RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can say that something that works across versions would be really useful for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For us too because we very often pyflyte run
which means we often don't have two executions of the same version.
This could be made configurable here:
concurrency=Concurrency(
max=1, # defines how many executions with this launch plan can run in parallel
policy=ConcurrencyPolicy.WAIT # defines the policy to apply when the max concurrency is reached,
level=ConcurrencyLevel.Version, # or ConcurrencyLevel.LaunchPlan
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @eapolinario @corleyma @fg91 for the feedback, I don't think this will be too much of a lift but added a proposal for different levels of precision here too
1. or fail the request when the concurrency policy is set to `ABORT` | ||
1. Do not create the workflow CRD | ||
|
||
Introduce an async reconciliation loop in FlyteAdmin to poll for all pending executions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have prior art for this kind of reconciliation loop in flyteadmin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the scheduler!
- created_at | ||
|
||
#### Open Questions | ||
- Should we always attempt to schedule pending executions in ascending order of creation time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this suggestion of having an enum listing the policies to the Implementation details section?
|
||
## 4 Metrics & Dashboards | ||
|
||
*What are the main metrics we should be measuring? For example, when interacting with an external system, it might be the external system latency. When adding a new table, how fast would it fill up?* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this feature going to be rolled out? Should we have an explicit list of metrics used to help the health of the feature? (e.g. total number of attempts of a given launchplan )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting question. I think scheduling attempts here is based on the polling interval right? But could be useful to understand time spent in PENDING
|
||
## 5 Drawbacks | ||
|
||
*Are there any reasons why we should not do this? Here we aim to evaluate risk and check ourselves.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any reservations about more load on the DB (even with indexes, etc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, we already have a ton of indices on executions - there is definitely a tradeoff to adding a new one
Sounds good, just wanted overall alignment before diving into the implementation. Will do that next and thank you already for all the feedback |
Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
added some more implementation details, mind taking another look @eapolinario |
} | ||
``` | ||
|
||
Furthermore, we may want to introduce a max pending period to fail executions that have been in `PENDING` for too long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I agree that this would be good.
## 8 Unresolved questions | ||
|
||
- Should we always attempt to schedule pending executions in ascending order of creation time? | ||
- Decision: We'll use FIFO scheduling by default but can extend scheduling behavior with an enum going forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this has been decided (I'm ok with it), could you please reformulate in the text above where this is still discussed as an open question? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the discussion above!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please change the filename to include the PR number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks
I'd add one thing, possibly out of scope for this RFC: it would be really nice to be able to define a "max execution concurrency" on the backend, either propeller-wide or per project/domain. Flyte would benefit from more controls that allow operators to protect quality of service and aren't dependent on workflow authors to set reasonable limits. |
hi @corleyma thanks for reviewing! re your comment on platform-max execution concurrency, that's really intriguing - would you want to start a separate discussion on that here: https://github.com/flyteorg/flyte/discussions so we don't lose track of the suggestion? execution namespace quota is meant to help address quality of service and fairness in a multitenant system but it would be cool to flesh out other mechanisms for managing overall executions |
execution namespace quota can help protect against workloads that would otherwise utilize too many cluster resources, but it doesn't really help protect e.g. flyte propeller from too many concurrent executions. I am happy to start a separate conversation though! |
ConcurrencyPolicy policy = 2; | ||
} | ||
|
||
enum ConcurrencyPolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have a replace option also?
To stop the previous execution and replace it with the current one. This is what k8s job does.
https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#concurrency-policy
Would be great if the behaviour can be made as close to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point, updated
09/26/2024 Contributors sync notes: no updates |
@@ -195,6 +198,7 @@ WHERE ( launch_plan_named_entity_id, created_at ) IN (SELECT launch_plan_named_ | |||
GROUP BY launch_plan_named_entity_id); | |||
``` | |||
|
|||
Note, in this proposal, registering a new version of the launch plan and setting it to active will determine the concurrency policy across all launch plan versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
I would not implement this with too many db calls. I would make the decision simple and completely in memory using Singletons like the scheduler. This is because |
1. or fail the request when the concurrency policy is set to `ABORT` | ||
1. Do not create the workflow CRD | ||
|
||
Introduce an async reconciliation loop in FlyteAdmin to poll for all pending executions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has to be done very carefully. Lets use the controller pattern. Just simply run everything in memory on one machine. We have seen with scheduler and propeller this is stable and highly scalable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes sorry for the lack of clarity, this is meant to follow the scheduler singleton pattern!
WHERE phase = 'PENDING' | ||
GROUP BY launch_plan_id); | ||
``` | ||
2. For each execution returned by the above query, `Add()` the pending execution to a [rate limiting workqueue](https://github.com/kubernetes/client-go/blob/master/util/workqueue/rate_limiting_queue.go#L27-L40) (as a suggestion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would not actually do this, this is because it avoids making across execution generalizations like priority etc. This is infact a scheduler.
Important thing to understand is when scheduler is and when is it not.
``` | ||
2. For each execution returned by the above query, `Add()` the pending execution to a [rate limiting workqueue](https://github.com/kubernetes/client-go/blob/master/util/workqueue/rate_limiting_queue.go#L27-L40) (as a suggestion) | ||
3. In a separate goroutine, fetch items from the workqueue and individually process each execution entry | ||
1. Check the database to see if there are fewer than `MAX_CONCURRENCY` non-terminal executions matching the launch plan ID in the pending execution model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would do this once in memory and keep a copy of running executions in memory. Then as the plan changes we can simply update the inmemory state, reducing the db load greatly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we reconcile the terminated executions to unblock concurrency gates if we don't read from the db?
Note, in this proposal, registering a new version of the launch plan and setting it to active will determine the concurrency policy across all launch plan versions. | ||
|
||
#### Prior Art | ||
The flyteadmin native scheduler (https://github.com/flyteorg/flyte/tree/master/flyteadmin/scheduler) already implements a reconciliation loop to catch up on any missed schedules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly look at the prior art. The number of db queries is minimal. Actually one per cycle.
I would keep all running executions with concurrency_level set in memory and all lps with concurrency_level set in memory (only the concurrency policies)
We should periodically update these and its ok to be eventually consistent
Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
Signed-off-by: Katrina Rogan <katroganGH@gmail.com>
It was suggested I add a use case to this PR -- slack thread I have a cron scheduled workflow (wf) that should not execute if the wf is already executing (from a previous schedule). Why? The wf executes a legacy app that checks if new files are available on S3. If so, then the app begins processing, which involves other resources that allow only single use. Processing time can vary from minutes to hours, depending on the data received. If another process is started, both will eventually fail due to resource contention. Considerations:
|
Related Issues
#5125
#420
#267
Docs link