-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set a timeout on the builds #866
Comments
Could this be mitigated by the resource limits/quotas to prevent any particular pod from hogging too much. I feel like this issue is unlikely because of the testing provided by Paketo buildpacks in regards to their dependencies and the fact that a bad actor that can create an image or build would be able to set the timeout for that image or build. That being said I would be curious to hear more about how this issue came up and if this is something that we should attempt to solve, potentially with your proposed solution. |
This might still happen accidentally and even paketo is not able to sidestep such an issue. An example is the python pip buildpack. pip currently doesn't terminate when resolving a non resolvable dependency tree, causing a build that never ends. |
It seems that this happening accidentally is almost more likely simply because of the mitigations a cluster operator can take on the resource usage side. I can see this being valuable for the case that there is no way to cancel a build that is hanging without deleting it, but then you lose that history, which might be useful for identifying issues over time. |
Is this something you experienced during your usage of kpack or is this more theoretical? |
I have definitely seen multiple tenants and builds caught out by a single bad package that they might be using. It causes long running builds that users can be entirely unaware of until they look closely at their CI/CD pipelines. If used with kp CLI with the wait feature, this also causes long standing watches to be created on the api server. |
Do you see this being more valuable as a field on the image spec or a config that an operator can set cluster or namespace wide? |
It would be better to have timeouts and fail early than a never ending stuck build. It also prevents users from submitting further builds until they cancel the previous one. Some background on fields we could expose https://kubernetes.io/docs/concepts/workloads/controllers/job/#handling-pod-and-container-failures both active deadline seconds and ttl seconds after finished would be nice to have As for setting the policy, I would imagine that a good course of action would be to expose the field on the image spec and have this field be both defaulted and validated using webhooks. This would allow some flexibility and sane defaults for the users while allowing the operator to set upper limits. |
Currently it's possible for a malicious actor or an improper dependency downloaded by a buildpack to create a cluster DoS attack by hoarding resources and resulting in a never ending build. kpack should expose active deadline seconds as a way to kill builds that take longer than a certain amount.
The text was updated successfully, but these errors were encountered: