Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] is it possible to limit number of childs in periodic jobs? #5462

Closed
samm-git opened this issue Mar 22, 2019 · 4 comments
Closed

Comments

@samm-git
Copy link

samm-git commented Mar 22, 2019

I am trying to understand if its possible to limit number of periodic job childs somehow.

E.g. i do have a periodic job running every 15 minutes which normally exits in a 5-30m. It is fine to have concurrent jobs, however, if due some bugs in the code this jobs would start hanging - one such parent and it childs will exhaust all cluster resources very soon as number of concurrent jobs will grow very fast.

Is there any way to limit allowed maximum amount of children?

@schmichael
Copy link
Member

Currently you cannot specify a run limit for periodic batch jobs. You may want to track #1782 as timeouts might be useful for your use case when implemented.

That being said one workaround would be to use constraints, probably node_class, to limit the number of nodes the periodic batch jobs could run on. For example if you have 5 servers with node_class=periodic that have 30gb of memory, and each invocation of the periodic batch job requires 3gb: (5 servers * 30 gb) / 3 = 50 - so 50 max running instances. Further invocations will be queued until resources are freed.

I'm going to close this ticket, but please feel free to open a feature request with your ideal behavior if this doesn't meet your needs.

@samm-git
Copy link
Author

@schmichael thank you for quick reply and explanation, that was very helpful.

As for now we would probably just disable prohibit_overlap for all running batches, as it seems to be very dangerous feature for the our env. However, it would be great to see some limits of maximum task children in the feature. Thanks for the hint with node_class, however, this is not the best situation for us as we will need to provision special "batch workers" nodes for that and its not something we would like to do.

@schmichael
Copy link
Member

Our enterprise product does have Namespaces and Quotas which would allow you to use my node_class approach without provisioning new nodes. You would launch these batch jobs in their own namespace with a resource constrained quota.

(I promise we are not intentionally leaving out the batch job limit to sell more licenses! I just wanted to offer another workaround.)

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants