-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2839: Add KEP for in-use protection #2840
base: master
Are you sure you want to change the base?
Conversation
mkimuram
commented
Jul 27, 2021
- One-line PR description: A generic feature to protect objects from deletion while it is in use
- Issue link: In-use protection #2839
- Other comments:
Cross-linking an old issue: kubernetes/kubernetes#10179 |
Sharing just a rough idea. Just for preventing deletion, the below design will be enough (We still need to consider how to protect deletion of
|
From the viewpoint of consuming lien from secret protection, which is my biggest concern. How to create |
Why do you want to have lien objects? Why are text strings not sufficient (as used for finalizers). Adding additional object types adds more places to have races, I'm against it unless you can convince me it's absolutely necessary. |
It's to make it easier to manage who has control over the specific lien in the delete-liens field. If multiple liens are set to a single resource, we will need to ensure that the specific lien is managed by the client/controller when deleting the lien. We may add the Instead, we might just add
Agree. If above concern can be solved by the other way, I agree to remove cluster-scoped |
You want to ACL liens? I don't see a need to enforce this any more than we do for finalizers. And I would not solve that problem by adding Lien or LienClaim objects. I don't think a solution here should require any additional object types. |
If only talking about the use case for secret protection, the feature needed is like block deleting "Secret A" that can be used by "Pod B" and "PersistentVolume C" while the secret is used. So, something like below will be enough.
And to generalize it, I was thinking about a way to manage such ID for each reason to avoid conflict in deleting liens by lien system's side. However, we can leave it to consumers. |
And also thinking about using a lien per controller, not per reason, like below.
Then, I start to think about which will be easier "implementing own admission hook to block deletion" or "implementing controller to add lien"? (Obviously, for users who would like to add such protection manually, lien systems is useful as this shows, but for controllers, it might not.) |
There is no need for a lien controller. We would add code to kube-apiserver (reject deletion if there are liens, don't permit new liens if deletion timestamp is set). We might want slightly more structure than raw text. You can prototype today with e.g. specially formatted annotations and a webhook admission controller. |
Sorry, I misread :) A lien per controller or per reason doesn't matter from the perspective of the lien mechanism, it's up to controller authors how they want to use the mechanism. |
I've implemented a prototype of lien as this. I will update the KEP based on this prototype. Note that I've tested below way and work as shown below:
ENABLE_ADMISSION_PLUGINS=Lien hack/local-up-cluster.sh
kubectl create secret generic test-secret --from-literal='username=my-app' --from-literal='password=39528$vdg7Jb'
kubectl get secret test-secret -o jsonpath='{.metadata.liens}{"\n"}'
kubectl delete secret test-secret
secret "test-secret" deleted
kubectl create secret generic test-secret --from-literal='username=my-app' --from-literal='password=39528$vdg7Jb'
kubectl patch secret test-secret -p '{"metadata":{"liens":["never delete me"]}}' --type=merge
secret/test-secret patched
kubectl get secret test-secret -o jsonpath='{.metadata.liens}{"\n"}'
[never delete me]
kubectl delete secret test-secret
Error from server (Forbidden): secrets "test-secret" is forbidden: deletion not allowed by liens
kubectl patch secret test-secret -p '{"metadata":{"liens":[]}}' --type=merge
secret/test-secret patched
kubectl delete secret test-secret
secret "test-secret" deleted |
4ec78cb
to
659be2f
Compare
I've updated the KEP. PTAL Also, I will upate the prototype of secret-protection to rely on this in-use protection mechanism to check the feasibility and also update the KEP for secret-protection. |
|
||
To minimize issues caused by a cross namespace dependency, it should be considered to implement | ||
a mechanism to opt-in/opt-out adding liens across namespace in such controllers. | ||
For example, a controller may choose to add `Liens` only to resources that has `example.com/sample-controller/allow-cross-namespace-liens: true` annotation, if the dependency resource isn't in the same namespace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend removing this whole section, honestly. I think it's a separate discussion for each controller and I don't think this KEP should recommend a general solution; I'm not convinced there's a problem at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I would like to confirm that this situation is allowed by design from community. If it is allowed, I agree to delete such a mechanism to avoid it completely.
Here's how a docs framing can help work out a viable way forward: write (or sketch) two things.
Sometimes - rarely - I've tried documenting a thing, the required words of caution have seemed faintly ridiculous, and the implementers have felt the same on seeing a draft. There's a redesign or something to avoid the worst of the snags. |
@mkimuram is this something you want to push for in the current cycle? |
|
||
Most of the design ideas come from [here](https://github.com/kubernetes/kubernetes/issues/10179). | ||
Users or controllers can add a string to `Liens` to ask to protect a resource. | ||
A newly introduced validation in api-server will reject deletion requests for resources with non-empty `Liens` and return ["409 Conflict" error code](https://datatracker.ietf.org/doc/html/rfc2616#section-10.4.10). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This puts the onus on clients to keep retrying if they want something to be deleted, right?
I prefer the declarative, eventually consistent nature of the current PVC protection implementation where a client can declare that it wants a thing to be deleted, and once it's safe to delete the thing, it is. We've built similar machinery to this in Crossplane and I believe it works pretty well.
For our use case "eventually consistent deletes" has nice symmetry with other operations. Most Crossplane resources correspond to some external system like a cloud provider. Let's say I want to create a GCP Subnet with Crossplane, but doing so requires the VPC it lives in to exist first. We don't reject the creation of the Subnet object in the API server because the VPC doesn't exist. Instead we accept the desired state (that a Subnet should exist) and our controller keeps trying to create it. Presumably it will eventually succeed when when the VPC exists.
Is there some way to address the requirement that deletes "don't start" (in the controller-reconciling-the-delete sense) without out-right rejecting the desire to delete a resource by an entity that has permission to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where a client can declare that it wants a thing to be deleted, and once it's safe to delete the thing, it is
This is a very different semantic than "I want to delete this" -> "no". The only way I know to achieve what you describe is with a finalizer, but this is not the right semantic. At that point, the object in question is VISIBLY being deleted and async controllers are doing their own cleanup. This is way to signal "begin the end-of-life process", not a way to delay it it.
Is there a different mechanism you are thinking of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very different semantic than "I want to delete this" -> "no".
Right. This KEP appears to propose that all forms of "in-use protection" use "I want to delete this" -> "no" semantics. Those semantics appear to be different from how the contemporary PV and PVC implementations of in-use protection work, which are more like "I want to delete this" -> "okay, eventually". For Crossplane's use-case, I prefer the semantics of the contemporary implementations to what is proposed by this KEP.
My understanding is that PV and PVC in-use protection are more like "I want to delete this" -> "okay, eventually, when not in-use". The deleted objects transition into a Terminating
state and go away once they're no longer in-use. We have the same "okay, eventually" semantics for Crossplane's ProviderConfig
in-use protection, and are looking to generalize the pattern.
Is there a different mechanism you are thinking of?
I'm not sure what you're asking. Can you elaborate?
To be clear - I think I understand the issues with finalizers and agree that they're not a good solution. I also don't have a good solution in mind. 🙂 What I would like is that somehow:
- If a client asks for an object be deleted
- The object can't be deleted yet because doing so would break some other, dependent object
- The delete (i.e. desired state) be accepted and reconciled when possible
I don't think my preferred semantics necessarily make sense for all forms of deletion protection. The "Motivation" section of this KEP states that it's attempting to solve for two use cases:
- Guaranteeing proper order of deletion (i.e. the PV and PVC use case, and Crossplane's)
- Avoiding accidental deletion of important resources
I feel that "I want to delete this" -> "no" semantics makes sense for the second use case, but not the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In KubeVirt we have a use-case with same or similar requirements:
- We are running our processes in the original pod
- There is an eviction request for this pod
- Upon this request we want to transfer some state of the processes to a new pod
- Upon the completed state transfer, the original pod is now free to be terminated
Important aspects are
a. Application can observe eviction request
b. Pod stays running until the state is transfered
c. caller of eviction is aware that the pod is preparing a shut-down
finalizers do not work, because the pod will be stuck in terminating state, even if the state transfer fails, and we need ot keep the original pod
pdbs protect the pods, but have no way to tell the caller that the eviction was observed and application is startng the process (we use this today, but it's leading to trouble, i.e. with descheduler).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a. Application can observe eviction request
b. Pod stays running until the state is transfered
That sounds like grounds for a separate enhancement. Right now evictions are, AIUI, either accepted immediately or denied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabiand We could model this a few ways, but evictions are generally "urgent", right? It's not clear to me how that should intersect with any sort of deletion delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin bringing your related comment back into this thread: 😄
To do what you want, we need a new lifecycle which includes some indefinite pre-delete state.
That may be the case, but what we want does seem in-scope for the goals defined by this design - i.e. guaranteeing proper order of deletion.
Take this manifest:
pod-with-volume.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: task-pv-pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: task-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 3Gi
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
Today I can kubectl apply -f pod-with-volume.yaml
and despite there being dependencies between the resources, the create just works and the controllers eventually arrive at my desired state - that they're created. I can kubectl delete -f pod-with-volume.yaml
too. My desired state (that all this stuff be deleted) is immediately accepted, and eventually it's all gone thanks to (ab)use of finalizers.
I believe under this proposal the kubectl delete -f pod-with-volume.yaml
UX would change. The Pod
would delete immediately but the PersistentVolume
and PersistentVolumeClaim
would fail with a 409 Conflict, and I'd basically have to retry a couple of times for everything to actually get deleted.
This does technically achieve the goal of guaranteeing proper order of deletion - it just seems like a pretty poor UX for doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin I was not ware that evictions had the notion of urgency.
Evictions respect PDBs, evictions are less hard than DELETE. Thus isn't eviction rather the less urgent approach and delete the real urgent way of getting rid of workloads?
fwiw kubectl drain --force --grace-period=0
is urgent.
Thus to me a delay to an eviction is similar to systemd inhibitors - a way to allow workloads to shut down gracefully. or even fail. And that's the difference: Fail gracefully, give admin/owner time to eeact to this. Before an admin jumps in with --force --grace-period=0
and to me this relates to kubernetes/kubernetes#66811 - making eviction declarative.
I meant to ask if there is some other mechanism you know of that I don't :) It's a big project nowadays. To do what you want, we need a new lifecycle which includes some indefinite pre-delete state. |
I've came across https://github.com/adobe/k8s-shredder#how-it-works
To me this connects with the desire to make applications aware of interruption requests. If we had an i.e. taint based draining, apps could watch this life-cycling - terminate themselfes properly, and free the node eventually. k8s-shredder would not need to do (imperative) eviction loops, instead: Taint (declare) a node to be drained, watch for node to be free. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale
…
Message ID: ***@***.***>
|
We have implemented a new API named Usage in Crossplane which is following a similar approach as proposed here, e.g. block deletion of objects in use with an admission webhook by returning a 409. This works fine in most cases, except the degraded experience due to huge delays caused by the backoff in Kubernetes Garbage collector. We discussed some possible solutions during the design but couldn't find any elegant solution. Basically, we need a way to reset garbage collectors backoff on a given resources. Is there were a way to achieve this? Just like most controllers, if making a change on a given resource (e.g. add an annotation/label) could requeue the resource immediately to the garbage collector work queue, we could just do that. Apparently this is not possible. Is there a way to reset garbage collectors backoff period by patching the resource somehow? Otherwise, the proposal here is in subject to the same problem I believe. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I repeat myself: #2840 (comment) @deads2k any softening of position? |
You know what, I'd like to repeat #2840 (comment) as well. I still recommend: write viable docs for the thing, then build it if we like what we see. |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |