Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dagster concurrency limits preventing some jobs from starting #2171

Closed
ravenac95 opened this issue Sep 18, 2024 · 1 comment · Fixed by #2172
Closed

Dagster concurrency limits preventing some jobs from starting #2171

ravenac95 opened this issue Sep 18, 2024 · 1 comment · Fixed by #2172
Assignees

Comments

@ravenac95
Copy link
Member

What is it?

Right now the max_concurrency limit for the queued run coordinator on dagster is set to 30, but we did this in hopes to limit concurrency for things like the open_collective backfill. Upon fixing issues with the k8s_job_executor and adding the dagster/concurrency_key on a per op basis we've been able to properly get things running with concurrency limits. However, because the max_concurrency value is set, dagster actually ends up spinning up pods for each pending run up to the max_concurrency value. The reason it does this is because these "runs" in dagster are used to monitor and trigger execution of a run's "steps." The steps are the actual asset/op code executing while the run is just a coordinator process. This is problematic because these pods are taking up unnecessary resources that are basically "sleeping" because their steps are being blocked by the dagster/concurrency_key. This then prevents new jobs from starting that are not blocked by the same concurrency_key.

Here are some potential paths forward:

  • Remove the max_concurrency limit. This has issues because that means if we queued 500 runs then 500 pods would be started to manage those runs.
  • To overcome that issue the dagster docs seems to point to this: https://docs.dagster.io/guides/limiting-concurrency-in-data-pipelines#throttling-concurrency-limited-runs
  • We may also want to look at introducing tag_concurrency_limits for jobs. Though this will encounter the issues we originally found when attempting to limit concurrency in that everything must be a job.
  • Set dagster/priority for our important daily jobs to be -1. This is a quick fix for now and should likely be done immediately.
@ravenac95
Copy link
Member Author

@Jabolol this is some detail on the problem I saw. I assigned both of us mostly to whomever gets parts/all of this done first. I imagine one of us may need it sooner (likely me :D)

@Jabolol Jabolol linked a pull request Sep 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants