-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resources are orphaned when MachineDeployment/MachineSet get deleted #8172
Comments
@sbueringer opened this one for further discussion |
@srm09 thanks for reporting this edge case! If I'm not mistaken we are already doing something similar for ownerReferences, so I'm not opposed to this given that it makes our system less fragile. |
+1 as discussed. We just have to check if we run into problems with server-side apply. I.e. we might end up with co-ownership and then the machine controller might be unable to remove the finalizer. But this should be easy to test. |
/help |
@fabriziopandini: GuidelinesPlease ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
👋 I'd like to take a shot at this one. To make sure I understand the path forward, the idea is to:
|
@cahillsf - I think the easiest first step would be to add the |
/assign @cahillsf |
@srm09 can you help me understand the original issue? I'd expect each controller to block reconciliation until they set their finaliser or let it be a bug otherwise |
As far as I'm aware the usual pattern is that each controller sets the finalizer during normal reconcile. And then as soon as an object has been deleted (e.g. by the user) the controller does some sort of cleanup before it removes its finalizer. Are you suggesting to instead block the deletion call in the deletion webhook? (in this case the Machine deletion webhook) I think we end up with the better UX if for the limited cases where we run into problems we already add the finalizer on object creation. This avoids unnecessary retries during deletion by "whoever wants to delete the object" (Kubernetes garbage collector, users, other controllers) Some more context. Let's say we're running at a high scale. In those scenarios it could take minutes for the Machine controller to set the finalizer if the queue of the Machine controller is overloaded. If in the meantime the MachineSet goes away (either through MD deletion or just a simple rollout of the MD) the Kubernetes garbage collector will trigger the Machine deletion. Because the Machine controller never had a chance to set the finalizer, reconcileDelete is never run. This means we could orphan resources because we never get a chance to run through the proper deletion flow. I think the current proposed solution is the best approach. |
Right that is what I was suggesting as well.
I was asking for some original context because I'm missing how by following the above pattern we could orphan things. i.e. if the Machine controller didn't set the finalizer on a Machine yet then it didn't have the chance to create anything that can be orphaned (It can only create things after setting the finaliser).
I agree, just cant' see what what was the particular issue in the first place that get us to look into this. |
Ah, got it! |
The problem in this specific case is that the MachineSet controller creates the Machine, InfraMachine and BootstrapConfig. |
To summarize what I wrote in the related Slack discussion (https://kubernetes.slack.com/archives/C8TSNPY4T/p1688058691112519): The normal creation flow looks like this:
This way finalizers are always on the objects before actual infrastructure is created and it doesn't matter when the MachineSet is deleted. @srm09 As far as I can see the VSphereMachine controller is not adding the finalizer synchronously. It just adds it and then patches the VSphereMachine with defer, while as far as I can tell creating infra in the same reconcile. But that would be (just) a bug in CAPV. That being said. I tested the current PR and went through it with the debugger. It definitely works as expected. Result of the discussion was that we agreed to move forward with this change. If we observe any unexpected behavior we now have a reasoning why we made this change and in the worst case can consider a revert. Thx @enxebre for raising this point! |
What steps did you take and what happened:
This is a special case wherein if the MachineDeployment and/or MachineSet get deleted, before the controllers have a chance to set the finalizer on the child objects (read
Machine
andVSphereMachine
), the resources created by these objects get orphaned since they get deleted immediately due to the absence of the finalizer on them.What did you expect to happen:
Consider changing the logic to add a finalizer during resource creation, instead of relying on the controller to set the finalizer during the reconciliation of the resource.
Maybe a temp finalizer gets set during resource creation which gets replaced by the finalizer from the controller during the first reconciliation loop of the object.
Anything else you would like to add:
Part of the problem arises since there is a certain time lag between the object being created by the MachineSet controller and the Machine controller setting the finalizer on the object. Similar behavior is observed in the
VSphereMachine
controller as well. This issue was highlighted because of a CAPV e2e test orphaning resources.Environment:
/kind bug
The text was updated successfully, but these errors were encountered: