Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set a timeout on the builds #866

Closed
sambhav opened this issue Oct 21, 2021 · 7 comments · Fixed by #958
Closed

Set a timeout on the builds #866

sambhav opened this issue Oct 21, 2021 · 7 comments · Fixed by #958

Comments

@sambhav
Copy link
Contributor

sambhav commented Oct 21, 2021

Currently it's possible for a malicious actor or an improper dependency downloaded by a buildpack to create a cluster DoS attack by hoarding resources and resulting in a never ending build. kpack should expose active deadline seconds as a way to kill builds that take longer than a certain amount.

@tomkennedy513
Copy link
Collaborator

Could this be mitigated by the resource limits/quotas to prevent any particular pod from hogging too much. I feel like this issue is unlikely because of the testing provided by Paketo buildpacks in regards to their dependencies and the fact that a bad actor that can create an image or build would be able to set the timeout for that image or build. That being said I would be curious to hear more about how this issue came up and if this is something that we should attempt to solve, potentially with your proposed solution.

@sambhav
Copy link
Contributor Author

sambhav commented Oct 21, 2021

This might still happen accidentally and even paketo is not able to sidestep such an issue. An example is the python pip buildpack. pip currently doesn't terminate when resolving a non resolvable dependency tree, causing a build that never ends.

@tomkennedy513
Copy link
Collaborator

It seems that this happening accidentally is almost more likely simply because of the mitigations a cluster operator can take on the resource usage side. I can see this being valuable for the case that there is no way to cancel a build that is hanging without deleting it, but then you lose that history, which might be useful for identifying issues over time.

@tomkennedy513
Copy link
Collaborator

Is this something you experienced during your usage of kpack or is this more theoretical?

@sambhav
Copy link
Contributor Author

sambhav commented Oct 21, 2021

I have definitely seen multiple tenants and builds caught out by a single bad package that they might be using. It causes long running builds that users can be entirely unaware of until they look closely at their CI/CD pipelines. If used with kp CLI with the wait feature, this also causes long standing watches to be created on the api server.

@tomkennedy513
Copy link
Collaborator

Do you see this being more valuable as a field on the image spec or a config that an operator can set cluster or namespace wide?

@sambhav
Copy link
Contributor Author

sambhav commented Oct 21, 2021

It would be better to have timeouts and fail early than a never ending stuck build. It also prevents users from submitting further builds until they cancel the previous one.

Some background on fields we could expose https://kubernetes.io/docs/concepts/workloads/controllers/job/#handling-pod-and-container-failures

both active deadline seconds and ttl seconds after finished would be nice to have

As for setting the policy, I would imagine that a good course of action would be to expose the field on the image spec and have this field be both defaulted and validated using webhooks. This would allow some flexibility and sane defaults for the users while allowing the operator to set upper limits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants