From 2119ab66554049711700848c1e08e5660f443d1a Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Tue, 20 Apr 2021 14:55:39 +0000 Subject: [PATCH 01/12] KEP-2639: Add KEP for secret protection --- .../2639-secret-protection/README.md | 428 ++++++++++++++++++ .../2639-secret-protection/kep.yaml | 47 ++ 2 files changed, 475 insertions(+) create mode 100644 keps/sig-storage/2639-secret-protection/README.md create mode 100644 keps/sig-storage/2639-secret-protection/kep.yaml diff --git a/keps/sig-storage/2639-secret-protection/README.md b/keps/sig-storage/2639-secret-protection/README.md new file mode 100644 index 00000000000..eb161294e25 --- /dev/null +++ b/keps/sig-storage/2639-secret-protection/README.md @@ -0,0 +1,428 @@ +# KEP-2639: Secret Protection + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Story 3](#story-3) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Graduation Criteria](#graduation-criteria) + - [Alpha -> Beta Graduation](#alpha---beta-graduation) + - [Beta -> GA Graduation](#beta---ga-graduation) + - [Removing a Deprecated Flag](#removing-a-deprecated-flag) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + + +## Release Signoff Checklist + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) +- [ ] (R) Graduation criteria is in place +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +This KEP proposes a feature to protect secrets while it is in use. Currently, user can delete a Secret that is being used by other resources, like Pods and PVs. This may have negative impact on the resouces using the Secret and it may result in data loss. + +Similar features for protecting PV and PVC already exist as [pv-protection](https://github.com/kubernetes/enhancements/issues/499) and [pvc-protection](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/postpone-pvc-deletion-if-used-in-a-pod.md). + + +## Motivation + +This feature aims to protect secrets from deleting while they are in-use. +Secrets can be used by below ways: +- From Pod: + - [Mounted as data volumes](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-files-from-a-pod) + - [Exposed as environment variables](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables) + - [Generic ephemeral volumes +](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes) (can be handled as CSI PV below) +- From PV: + - [CSI](https://kubernetes-csi.github.io/docs/secrets-and-credentials-storage-class.html): + - provisioner secret + - controller publish secret + - node stage secret + - node publish secret + - controller expand secret + - non-CSI: + - dependent on each storage driver and will be deprecated soon (Out of scope) +- [From Snapshot](https://kubernetes-csi.github.io/docs/secrets-and-credentials-volume-snapshot-class.html): + - snapshotter secret + +### Goals + +- Protect secrets from deleting while they are in use. + +### Non-Goals + +- Protect important secrets that aren't in use from deleting +- Protect other resources than secret from deleting. + +## Proposal + +A new controller to protect secret is introduced. + +### User Stories + +#### Story 1 + +A user creates a secret and a pod using the secret. Then, the user mistakenly delete the secret while the pod is using it. +The secret is protected until the pod using the secret is deleted. + +#### Story 2 + +A user creates a volume that uses a certain secret in the same namespace. Then, the user delete the namespace. +The secret is protected until the volume using the secret is deleted and the deletion of the volume succeeds. + +#### Story 3 + +A user really would like to delete a secret inspite that it is used by other resources. +The user force delete the secret while it is used by other resources, and the secret isn't protected and is actually deleted. + +### Notes/Constraints/Caveats (Optional) + +- Compatibility: + - There might be many existing scripts that don't care the order of deletion. Therefore, such scripts might stuck on secret deletion, if the deletion of the resources using the secrets are done later. +- Usability: + - Use of the secret in other resource will not be obvious to users. Therefore, users might not easily understand why the secret is not deleted. + - Users might need to force delete the secret on deletion or would like to avoid protection for certain secrets that already exist or that are newly created. + +### Risks and Mitigations + + + +## Design Details + +This feature would be able to implement by the same way to pv-protection/pvc-protection. +- The deletion is blocked by using newly introduced `kubernetes.io/secret-protection` finalizer, +- The finalizer will be added on creation of the secret by using admission controller, +- The finalizer will be deleted by newly introduced `secret-protection-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resrouces. + +### Test Plan + +- For Alpha, unit tests and e2e tests verifying that a secret used by other resources is protected by this feature are added. +- For Beta, scalability tests are added to exercise this feature. +- For GA, the introduced e2e tests will be promoted to conformance. + +### Graduation Criteria +#### Alpha -> Beta Graduation + +- Gather feedback from developers and surveys +- Tests are in Testgrid and linked in KEP + +#### Beta -> GA Graduation + +- Allowing time for feedback + +#### Removing a Deprecated Flag + +- Announce deprecation and support policy of the existing flag +- Two versions passed since introducing the functionality that deprecates the flag (to address version skew) +- Address feedback on usage/changed behavior, provided on GitHub issues +- Deprecate the flag + +### Upgrade / Downgrade Strategy + + +- Upgrade: Secret that doesn't have `kubernetes.io/secret-protection` finalizer will be added the finalizer on Update/Delete events, therefore no additional user operation will be needed. +- Downgrade: + - Feature disabled case: If the secret-protection-controller exists and the feature is disabled, `kubernetes.io/secret-protection` finalizer will always be deleted, therefore no additional user operation will be needed, + - Downgraded to no secret-protection-controller case: If no secret-protection-controller exists but `kubernetes.io/secret-protection` finalizer is added to the secrets, no one remove the finalizer. Therefore, user needs to remove the `kubernetes.io/secret-protection` finalizer from all the secrets manually. + +### Version Skew Strategy + +- As for components, this feature involves only admission controller and secret-protection-controller, so version skew won't happen unless these components are available with different versions, +- As for resources, CSI Volume and CSI Snapshot are involved, changes in the API/CRD of these resources especially for their secret fields might cause issue. Howerver, this should be compatibility issue for these API/CRDs. + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback +###### How can this feature be enabled / disabled in a live cluster? + +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: SecretInUseProtection + - Components depending on the feature gate: + - secret-protection-controller + - storageobjectinuseprotection admission plugin + +###### Does enabling the feature change any default behavior? + +Secrets aren't deleted until the resources using them aren't deleted. + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + +Yes, by disabling the feature gates. + +###### What happens if we reenable the feature if it was previously rolled back? + +Secrets aren't deleted until the resources using them aren't deleted, again. + +###### Are there any tests for feature enablement/disablement? + +Yes, unit tests for the secret-protection-controller cover scenarios where the feature is disabled or enabled. + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + +There will be secrets which have `kubernetes.io/secret-protection` finalizer. + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + +- [x] Metrics + - Metric name: secret_protection_controller + - [Optional] Aggregation method: prometheus + - Components exposing the metric: secret-protection-controller + +###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs? + + + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability +###### Will enabling / using this feature result in any new API calls? + +- API call type: Update Secret, List Pod/PV/Snapshot, Get PVC/SC +- estimated throughput: TBD +- originating component: secret-protection-controller +- API calls are triggered by changes of secrets, Pod, PV, Snapshot + +###### Will enabling / using this feature result in introducing new API types? + +No. + +###### Will enabling / using this feature result in any new calls to the cloud provider? + +No. + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + +- API type(s): Secret +- Estimated increase in size: the size of `kubernetes.io/secret-protection` finalizer per secret +- Estimated amount of new objects: N/A + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + +API: + - Size: Increase in size of the Secret is very limited. + - Number of calls: Rate limit is set for the number of API calls. +Disk/IO: + - No disk/IO are done through other than API calls and log outputs +CPU/RAM: + - It works as common controller pattern. Therefore, number of resouces to process and the logic on how to detect in-use secret should only be needed to be checked. + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + +- Manually adding/deleting finalizer to/from secrets that shouldn't be deleted in certain life cycle +- Introduce a new kind of reference, like usedReference, and leave addition/deletion of it to users + (Similar to ownerReference, but just block deletion and won't try to delete referenced resources through GC, like deleting child on parent's deletion). + +Above ways will force users to do some additional works to protect secrets. Also, they are inconsistent with pv-protection/pvc-protection concepts. diff --git a/keps/sig-storage/2639-secret-protection/kep.yaml b/keps/sig-storage/2639-secret-protection/kep.yaml new file mode 100644 index 00000000000..64fb5279a18 --- /dev/null +++ b/keps/sig-storage/2639-secret-protection/kep.yaml @@ -0,0 +1,47 @@ +title: secret protection +kep-number: 2639 +authors: + - "@mkimuram" +owning-sig: sig-storage +participating-sigs: + - sig-storage +status: provisional +creation-date: 2021-04-20 +reviewers: + - TBD +approvers: + - TBD +prr-approvers: + - TBD +#see-also: +# - "/keps/sig-aaa/1234-we-heard-you-like-keps" +# - "/keps/sig-bbb/2345-everyone-gets-a-kep" +#replaces: +# - "/keps/sig-ccc/3456-replaced-kep" + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.22" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.22" + beta: "v1.23" + stable: "v1.24" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: SecretInUseProtection + components: + - secret-protection-controller + - storageobjectinuseprotection admission plugin +disable-supported: true + +# The following PRR answers are required at beta release +metrics: + - my_feature_metric From c25d1bb435157577c31dd48d1ef65347acc182eb Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Wed, 12 May 2021 17:17:54 +0000 Subject: [PATCH 02/12] Addressed review comments --- keps/prod-readiness/sig-storage/2639.yaml | 3 + .../2639-secret-protection/README.md | 78 ++++++++++++------- .../2639-secret-protection/kep.yaml | 12 ++- 3 files changed, 61 insertions(+), 32 deletions(-) create mode 100644 keps/prod-readiness/sig-storage/2639.yaml diff --git a/keps/prod-readiness/sig-storage/2639.yaml b/keps/prod-readiness/sig-storage/2639.yaml new file mode 100644 index 00000000000..d38e44af2c4 --- /dev/null +++ b/keps/prod-readiness/sig-storage/2639.yaml @@ -0,0 +1,3 @@ +kep-number: 2639 +alpha: + approver: "@johnbelamaric" diff --git a/keps/sig-storage/2639-secret-protection/README.md b/keps/sig-storage/2639-secret-protection/README.md index eb161294e25..b84f3d6a147 100644 --- a/keps/sig-storage/2639-secret-protection/README.md +++ b/keps/sig-storage/2639-secret-protection/README.md @@ -55,14 +55,14 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary -This KEP proposes a feature to protect secrets while it is in use. Currently, user can delete a Secret that is being used by other resources, like Pods and PVs. This may have negative impact on the resouces using the Secret and it may result in data loss. +This KEP proposes a feature to protect secrets while it is in use. Currently, user can delete a Secret that is being used by other resources, like Pods, PVs, and VolumeSnapshots. This may have negative impact on the resouces using the Secret and it may result in data loss. Similar features for protecting PV and PVC already exist as [pv-protection](https://github.com/kubernetes/enhancements/issues/499) and [pvc-protection](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/postpone-pvc-deletion-if-used-in-a-pod.md). ## Motivation -This feature aims to protect secrets from deleting while they are in-use. +This feature aims to protect secrets from being deleted while they are in-use. Secrets can be used by below ways: - From Pod: - [Mounted as data volumes](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-files-from-a-pod) @@ -78,17 +78,17 @@ Secrets can be used by below ways: - controller expand secret - non-CSI: - dependent on each storage driver and will be deprecated soon (Out of scope) -- [From Snapshot](https://kubernetes-csi.github.io/docs/secrets-and-credentials-volume-snapshot-class.html): +- [From VolumeSnapshot](https://kubernetes-csi.github.io/docs/secrets-and-credentials-volume-snapshot-class.html): - snapshotter secret ### Goals -- Protect secrets from deleting while they are in use. +- Protect secrets from being deleted while they are in use. ### Non-Goals -- Protect important secrets that aren't in use from deleting -- Protect other resources than secret from deleting. +- Protect important secrets that aren't in use from being deleted +- Protect resources other than secrets from being deleted. ## Proposal @@ -108,7 +108,7 @@ The secret is protected until the volume using the secret is deleted and the del #### Story 3 -A user really would like to delete a secret inspite that it is used by other resources. +A user really would like to delete a secret despite that it is used by other resources. The user force delete the secret while it is used by other resources, and the secret isn't protected and is actually deleted. ### Notes/Constraints/Caveats (Optional) @@ -135,10 +135,20 @@ Consider including folks who also work outside the SIG or subproject. ## Design Details -This feature would be able to implement by the same way to pv-protection/pvc-protection. -- The deletion is blocked by using newly introduced `kubernetes.io/secret-protection` finalizer, -- The finalizer will be added on creation of the secret by using admission controller, -- The finalizer will be deleted by newly introduced `secret-protection-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resrouces. +This feature would be able to implement in the same way as `pv-protection/pvc-protection`. +Protection logics for in-tree resources and out-of-tree resources are separeted and independently work. +It is due to the restriction that CRDs can't be handled from in-tree controller. + +- In-tree resources(`Pod` and `PersistentVolume`): + - The deletion is blocked by using newly introduced `kubernetes.io/secret-protection` finalizer, + - The `kubernetes.io/secret-protection` finalizer will be added on creation of the secret by using admission controller for in-tree resources, + - The `kubernetes.io/secret-protection` finalizer will be deleted by newly introduced in-tree `secret-protection-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. +- Out-of-tree resources(`VolumeSnapshot`): + - The deletion is blocked by using newly introduced `snapshot.storage.kubernetes.io/secret-protection` finalizer, + - The `snapshot.storage.kubernetes.io/secret-protection` finalizer will be added on creation of the secret by using admission controller for `VolumeSnapshot`, + - The `snapshot.storage.kubernetes.io/secret-protection` finalizer will be deleted by newly introduced out-of-tree `secret-protection-volumesnapshot-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. + +Feature gate `SecretInUseProtection` is shared between in-tree and out-of-tree. Therefore, users won't be able to choose enabling secret portection for either in-tree or out-of-tree. On the other hand, users that doesn't install `VolumeSnapshot` feature won't be affected by secret protection for `VolumeSnapshot`. ### Test Plan @@ -176,14 +186,20 @@ enhancement: - What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade, in order to make use of the enhancement? --> -- Upgrade: Secret that doesn't have `kubernetes.io/secret-protection` finalizer will be added the finalizer on Update/Delete events, therefore no additional user operation will be needed. +- Upgrade: Secret that doesn't have `kubernetes.io/secret-protection` finalizer and/or `snapshot.storage.kubernetes.io/secret-protection` finalizer will be added the finalizers on Update/Delete events, therefore no additional user operation will be needed. - Downgrade: - - Feature disabled case: If the secret-protection-controller exists and the feature is disabled, `kubernetes.io/secret-protection` finalizer will always be deleted, therefore no additional user operation will be needed, - - Downgraded to no secret-protection-controller case: If no secret-protection-controller exists but `kubernetes.io/secret-protection` finalizer is added to the secrets, no one remove the finalizer. Therefore, user needs to remove the `kubernetes.io/secret-protection` finalizer from all the secrets manually. + - Feature disabled case: + - If the `secret-protection-controller` exists and the feature is disabled, `kubernetes.io/secret-protection` finalizer will always be deleted, therefore no additional user operation will be needed, + - If the `secret-protection-volumesnapshot-controller` exists and the feature is disabled, `snapshot.storage.kubernetes.io/secret-protection` finalizer will always be deleted, therefore no additional user operation will be needed, + - Downgraded to no controller case: + - If no `secret-protection-controller` exists but `kubernetes.io/secret-protection` finalizer is added to the secrets, no one remove the finalizer. Therefore, user needs to remove the `kubernetes.io/secret-protection` finalizer from all the secrets manually. + - If no `secret-protection-volumesnapshot-controller` exists but `snapshot.storage.kubernetes.io/secret-protection` finalizer is added to the secrets, no one remove the finalizer. Therefore, user needs to remove the `kubernetes.io/secret-protection` finalizer from all the secrets manually. ### Version Skew Strategy -- As for components, this feature involves only admission controller and secret-protection-controller, so version skew won't happen unless these components are available with different versions, +- As for the difference between in-tree components and out-of-tree components, they work independently, so they won't affect each other, +- As for in-tree components, the protection logic involves only in-tree admission controller and in-tree secret-protection-controller, so version skew won't happen unless these components are available with different versions, +- As for out-of-tree components, the protection logic involves only out-of-tree admission controller and out-of-tree secret-protection-volumesnapshot-controller, so version skew won't happen unless these components are available with different versions, - As for resources, CSI Volume and CSI Snapshot are involved, changes in the API/CRD of these resources especially for their secret fields might cause issue. Howerver, this should be compatibility issue for these API/CRDs. ## Production Readiness Review Questionnaire @@ -216,24 +232,27 @@ you need any help or guidance. - [x] Feature gate (also fill in values in `kep.yaml`) - Feature gate name: SecretInUseProtection - Components depending on the feature gate: - - secret-protection-controller - - storageobjectinuseprotection admission plugin + - kube-controller-manager (in-tree) + - secret-protection-controllerr (in-tree) + - storageobjectinuseprotection admission plugin (in-tree) + - secret-protection-volumesnapshot-controller (out-of-tree) + - storageobjectinuseprotection volumesnapshot admission plugin (out-of-tree) ###### Does enabling the feature change any default behavior? -Secrets aren't deleted until the resources using them aren't deleted. +Secrets aren't deleted until the resources using them are deleted. ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? -Yes, by disabling the feature gates. +Yes, by disabling the feature gate. After the feature gate is disabled, users need to manually delete the `kubernetes.io/secret-protection` finalizer and the `snapshot.storage.kubernetes.io/secret-protection` finalizer to make secret deleted properly. ###### What happens if we reenable the feature if it was previously rolled back? -Secrets aren't deleted until the resources using them aren't deleted, again. +Secrets aren't deleted until the resources using them are deleted, again. ###### Are there any tests for feature enablement/disablement? -Yes, unit tests for the secret-protection-controller cover scenarios where the feature is disabled or enabled. +Yes, unit tests for the secret-protection-controller and secret-protection-volumesnapshot-controller cover scenarios where the feature is disabled or enabled. ### Rollout, Upgrade and Rollback Planning @@ -277,14 +296,17 @@ This section must be completed when targeting beta to a release. ###### How can an operator determine if the feature is in use by workloads? -There will be secrets which have `kubernetes.io/secret-protection` finalizer. +There will be secrets which have `kubernetes.io/secret-protection` finalizer and `snapshot.storage.kubernetes.io/secret-protection` finalizer. ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? - [x] Metrics - Metric name: secret_protection_controller - - [Optional] Aggregation method: prometheus - - Components exposing the metric: secret-protection-controller + - Aggregation method: prometheus + - Components exposing the metric: secret-protection-controller + - Metric name: secret_protection_volumesnapshot_controller + - Aggregation method: prometheus + - Components exposing the metric: secret-protection-volumesnapshot-controller ###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs? @@ -331,10 +353,10 @@ and creating new ones, as well as about cluster-level services (e.g. DNS): ### Scalability ###### Will enabling / using this feature result in any new API calls? -- API call type: Update Secret, List Pod/PV/Snapshot, Get PVC/SC +- API call type: Update Secret, List Pod/PV/VolumeSnapshot, and Get PVC/SC - estimated throughput: TBD -- originating component: secret-protection-controller -- API calls are triggered by changes of secrets, Pod, PV, Snapshot +- originating component: secret-protection-controller and secret-protection-volumesnapshot-controller +- API calls are triggered by changes of secrets, Pod, PV, VolumeSnapshot ###### Will enabling / using this feature result in introducing new API types? @@ -347,7 +369,7 @@ No. ###### Will enabling / using this feature result in increasing size or count of the existing API objects? - API type(s): Secret -- Estimated increase in size: the size of `kubernetes.io/secret-protection` finalizer per secret +- Estimated increase in size: the size of `kubernetes.io/secret-protection` finalizer and `snapshot.storage.kubernetes.io/secret-protection` finalizer per secret - Estimated amount of new objects: N/A ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? diff --git a/keps/sig-storage/2639-secret-protection/kep.yaml b/keps/sig-storage/2639-secret-protection/kep.yaml index 64fb5279a18..badb597c1dc 100644 --- a/keps/sig-storage/2639-secret-protection/kep.yaml +++ b/keps/sig-storage/2639-secret-protection/kep.yaml @@ -11,8 +11,8 @@ reviewers: - TBD approvers: - TBD -prr-approvers: - - TBD +#prr-approvers: +# - TBD #see-also: # - "/keps/sig-aaa/1234-we-heard-you-like-keps" # - "/keps/sig-bbb/2345-everyone-gets-a-kep" @@ -38,8 +38,12 @@ milestone: feature-gates: - name: SecretInUseProtection components: - - secret-protection-controller - - storageobjectinuseprotection admission plugin + - kube-controller-manager (in-tree) + - secret-protection-controllerr (in-tree) + - storageobjectinuseprotection admission plugin (in-tree) + - secret-protection-volumesnapshot-controller (out-of-tree) + - storageobjectinuseprotection volumesnapshot admission plugin (out-of-tree) + disable-supported: true # The following PRR answers are required at beta release From 52435a8fa6b8a97fcdde7be151cb4a8d42022799 Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Wed, 12 May 2021 22:54:04 +0000 Subject: [PATCH 03/12] Addressed review comments --- .../2639-secret-protection/README.md | 77 ++++++++++++++----- .../2639-secret-protection/kep.yaml | 27 ++----- 2 files changed, 66 insertions(+), 38 deletions(-) diff --git a/keps/sig-storage/2639-secret-protection/README.md b/keps/sig-storage/2639-secret-protection/README.md index b84f3d6a147..3ed8f30e9a9 100644 --- a/keps/sig-storage/2639-secret-protection/README.md +++ b/keps/sig-storage/2639-secret-protection/README.md @@ -132,34 +132,78 @@ How will UX be reviewed, and by whom? Consider including folks who also work outside the SIG or subproject. --> +There is a corner case when Reclaim Policy is Retain and PV is referencing secrets in a certain user namespace. +In such a case, when PVC is deleted, the PV referencing secrets remains as a resource not managed by the namespace, as a result, deletion of the secrets is blocked. +However, the user of the namespace won't notice why it is blocked and won't find a way to delete the secrets other than manually deleting finalizer. +In addition, to make things worse, provisioner secrets that are defined by using template with PVC information, +like "${pvc.name}", would be deleted in above case, due to the failure in finding a reference to the secret, +because the reference to provisioner secret is not stored in PV and needs to be resolved on deletion time with PVC information that is already deleted. +This behavior is inconsistent with other secrets, like node publish secret whose reference is stored in PV. ## Design Details -This feature would be able to implement in the same way as `pv-protection/pvc-protection`. +This feature can be implemented in the same way as `pv-protection/pvc-protection`. Protection logics for in-tree resources and out-of-tree resources are separeted and independently work. It is due to the restriction that CRDs can't be handled from in-tree controller. - In-tree resources(`Pod` and `PersistentVolume`): - The deletion is blocked by using newly introduced `kubernetes.io/secret-protection` finalizer, - - The `kubernetes.io/secret-protection` finalizer will be added on creation of the secret by using admission controller for in-tree resources, - - The `kubernetes.io/secret-protection` finalizer will be deleted by newly introduced in-tree `secret-protection-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. + - The `kubernetes.io/secret-protection` finalizer will be always added on creation of the secret by using admission controller for in-tree resources, + - After the secret is requested to be deleted (`deletionTimestamp` is set), the `kubernetes.io/secret-protection` finalizer will be deleted by newly introduced in-tree `secret-protection-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. + - If the `SecretInUseProtection` feature gate is disalbed, finalizer is added on creation, but deleted regardless of whether it is in-use after `deletionTimestamp` is set. This will allow users to avoid manually deleting finalizers on the downgrading by disabling the feature. - Out-of-tree resources(`VolumeSnapshot`): - The deletion is blocked by using newly introduced `snapshot.storage.kubernetes.io/secret-protection` finalizer, - - The `snapshot.storage.kubernetes.io/secret-protection` finalizer will be added on creation of the secret by using admission controller for `VolumeSnapshot`, - - The `snapshot.storage.kubernetes.io/secret-protection` finalizer will be deleted by newly introduced out-of-tree `secret-protection-volumesnapshot-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. + - The `snapshot.storage.kubernetes.io/secret-protection` finalizer will be always added on creation of the secret by using admission controller for `VolumeSnapshot`, + - After the secret is requested to be deleted (`deletionTimestamp` is set), the `snapshot.storage.kubernetes.io/secret-protection` finalizer will be deleted by newly introduced out-of-tree `secret-protection-volumesnapshot-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. -Feature gate `SecretInUseProtection` is shared between in-tree and out-of-tree. Therefore, users won't be able to choose enabling secret portection for either in-tree or out-of-tree. On the other hand, users that doesn't install `VolumeSnapshot` feature won't be affected by secret protection for `VolumeSnapshot`. +Feature gate `SecretInUseProtection` is used only for in-tree controller. Out-of-tree controller will be always enabled when the `secret-protection-volumesnapshot-controller` is deployed. + +For users to force delete the secret, users need to do either: +1. manually delete the finalizer, by below command: + ``` + kubectl patch secret/secret-to-be-deleted -p '{"metadata":{"finalizers":[]}}' --type=merge + ``` +2. add `secret.kubernetes.io/disable-protection: "yes"` annotation to opt-out this feature per secret + +Annotation will more user friendly than directly deleting finalizer. However, it can't be used in the case that this feature is once enabled and disabled later by deleting the controller. For this case, it may be needed to provide a script to delete the finalizer on all the secret. ### Test Plan - For Alpha, unit tests and e2e tests verifying that a secret used by other resources is protected by this feature are added. + - unit tests: + - SecretInUseProtection enabled: + - Verify immediate deletion of a secret that is not used + - Verify that secret used by a Pod is not removed immediately + - Verify that secret used by a CSI PV as controllerPublishSecret is not removed immediately + - Verify that secret used by a CSI PV as nodeStageSecret is not removed immediately + - Verify that secret used by a CSI PV as nodePublishSecret is not removed immediately + - Verify that secret used by a CSI PV as controllerExpandSecret is not removed immediately + - Verify that secret used by a CSI PV as provisionerSecret is not removed immediately + - Verify that secret used by a CSI PV as provisionerSecret via template is not removed immediately + - Verify that secret used by a VolumeSnapshot is not removed immediately + - Verify that secret used by a VolumeSnapshot as snapshotterSecret via template is not removed immediately + - SecretInUseProtection disabled: + - Verify immediate deletion of a secret that is not used + - Verify immediate deletion of a secret that is used by a Pod + - Verify immediate deletion of a secret with finalizer that is used by a Pod + - e2e tests: + - Verify immediate deletion of a secret that is not used + - Verify that secret used by a Pod is not removed immediately + - Verify that secret used by a CSI PV as controllerPublishSecret is not removed immediately + - Verify that secret used by a CSI PV as nodeStageSecret is not removed immediately + - Verify that secret used by a CSI PV as nodePublishSecret is not removed immediately + - Verify that secret used by a CSI PV as controllerExpandSecret is not removed immediately + - Verify that secret used by a CSI PV as provisionerSecret is not removed immediately + - Verify that secret used by a CSI PV as provisionerSecret via template is not removed immediately + - Verify that secret used by a VolumeSnapshot is not removed immediately + - Verify that secret used by a VolumeSnapshot as snapshotterSecret via template is not removed immediately - For Beta, scalability tests are added to exercise this feature. - For GA, the introduced e2e tests will be promoted to conformance. ### Graduation Criteria #### Alpha -> Beta Graduation -- Gather feedback from developers and surveys +- Gather feedback from developers and surveys of users - Tests are in Testgrid and linked in KEP #### Beta -> GA Graduation @@ -168,10 +212,7 @@ Feature gate `SecretInUseProtection` is shared between in-tree and out-of-tree. #### Removing a Deprecated Flag -- Announce deprecation and support policy of the existing flag -- Two versions passed since introducing the functionality that deprecates the flag (to address version skew) -- Address feedback on usage/changed behavior, provided on GitHub issues -- Deprecate the flag +N/A ### Upgrade / Downgrade Strategy @@ -232,15 +273,13 @@ you need any help or guidance. - [x] Feature gate (also fill in values in `kep.yaml`) - Feature gate name: SecretInUseProtection - Components depending on the feature gate: - - kube-controller-manager (in-tree) - - secret-protection-controllerr (in-tree) - - storageobjectinuseprotection admission plugin (in-tree) - - secret-protection-volumesnapshot-controller (out-of-tree) - - storageobjectinuseprotection volumesnapshot admission plugin (out-of-tree) + - kube-controller-manager + - secret-protection-controller (part of kube-controller-manager) + - storageobjectinuseprotection admission plugin (part of kube-controller-manager) ###### Does enabling the feature change any default behavior? -Secrets aren't deleted until the resources using them are deleted. +Yes. Secrets aren't deleted until the resources using them are deleted. ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? @@ -369,7 +408,7 @@ No. ###### Will enabling / using this feature result in increasing size or count of the existing API objects? - API type(s): Secret -- Estimated increase in size: the size of `kubernetes.io/secret-protection` finalizer and `snapshot.storage.kubernetes.io/secret-protection` finalizer per secret +- Estimated increase in size: the size of `kubernetes.io/secret-protection` finalizer, `snapshot.storage.kubernetes.io/secret-protection` finalizer, and `secret.kubernetes.io/disable-protection` annotation per secret - Estimated amount of new objects: N/A ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? @@ -443,7 +482,7 @@ Why should this KEP _not_ be implemented? ## Alternatives -- Manually adding/deleting finalizer to/from secrets that shouldn't be deleted in certain life cycle +- Manually adding/deleting finalizer to/from secrets that are not deleted automatically in certain life cycle - Introduce a new kind of reference, like usedReference, and leave addition/deletion of it to users (Similar to ownerReference, but just block deletion and won't try to delete referenced resources through GC, like deleting child on parent's deletion). diff --git a/keps/sig-storage/2639-secret-protection/kep.yaml b/keps/sig-storage/2639-secret-protection/kep.yaml index badb597c1dc..b13026935ac 100644 --- a/keps/sig-storage/2639-secret-protection/kep.yaml +++ b/keps/sig-storage/2639-secret-protection/kep.yaml @@ -5,19 +5,14 @@ authors: owning-sig: sig-storage participating-sigs: - sig-storage -status: provisional +status: implementable creation-date: 2021-04-20 reviewers: - - TBD + - "@liggitt" + - "@xing-yang" approvers: - - TBD -#prr-approvers: -# - TBD -#see-also: -# - "/keps/sig-aaa/1234-we-heard-you-like-keps" -# - "/keps/sig-bbb/2345-everyone-gets-a-kep" -#replaces: -# - "/keps/sig-ccc/3456-replaced-kep" + - "@xing-yang" + - "@saad-ali" # The target maturity stage in the current dev cycle for this KEP. stage: alpha @@ -38,14 +33,8 @@ milestone: feature-gates: - name: SecretInUseProtection components: - - kube-controller-manager (in-tree) - - secret-protection-controllerr (in-tree) - - storageobjectinuseprotection admission plugin (in-tree) - - secret-protection-volumesnapshot-controller (out-of-tree) - - storageobjectinuseprotection volumesnapshot admission plugin (out-of-tree) + - kube-controller-manager + - secret-protection-controller (part of kube-controller-manager) + - storageobjectinuseprotection admission plugin (part of kube-controller-manager) disable-supported: true - -# The following PRR answers are required at beta release -metrics: - - my_feature_metric From 8406d39d1a3da18f1887960f8e8ed33c490e014d Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Thu, 13 May 2021 19:03:07 +0000 Subject: [PATCH 04/12] Addressed review comments --- keps/sig-storage/2639-secret-protection/README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/keps/sig-storage/2639-secret-protection/README.md b/keps/sig-storage/2639-secret-protection/README.md index 3ed8f30e9a9..9959229a234 100644 --- a/keps/sig-storage/2639-secret-protection/README.md +++ b/keps/sig-storage/2639-secret-protection/README.md @@ -163,9 +163,9 @@ For users to force delete the secret, users need to do either: ``` kubectl patch secret/secret-to-be-deleted -p '{"metadata":{"finalizers":[]}}' --type=merge ``` -2. add `secret.kubernetes.io/disable-protection: "yes"` annotation to opt-out this feature per secret +2. add `secret.kubernetes.io/skip-secret-protection: "yes"` annotation to opt-out this feature per secret -Annotation will more user friendly than directly deleting finalizer. However, it can't be used in the case that this feature is once enabled and disabled later by deleting the controller. For this case, it may be needed to provide a script to delete the finalizer on all the secret. +Annotation will be more user friendly than directly deleting finalizer. However, it can't be used in the case that this feature is once enabled and disabled later by deleting the controller. For this case, it may be needed to provide a script to delete the finalizer on all the secrets. ### Test Plan @@ -233,8 +233,8 @@ enhancement: - If the `secret-protection-controller` exists and the feature is disabled, `kubernetes.io/secret-protection` finalizer will always be deleted, therefore no additional user operation will be needed, - If the `secret-protection-volumesnapshot-controller` exists and the feature is disabled, `snapshot.storage.kubernetes.io/secret-protection` finalizer will always be deleted, therefore no additional user operation will be needed, - Downgraded to no controller case: - - If no `secret-protection-controller` exists but `kubernetes.io/secret-protection` finalizer is added to the secrets, no one remove the finalizer. Therefore, user needs to remove the `kubernetes.io/secret-protection` finalizer from all the secrets manually. - - If no `secret-protection-volumesnapshot-controller` exists but `snapshot.storage.kubernetes.io/secret-protection` finalizer is added to the secrets, no one remove the finalizer. Therefore, user needs to remove the `kubernetes.io/secret-protection` finalizer from all the secrets manually. + - If no `secret-protection-controller` exists but `kubernetes.io/secret-protection` finalizer is added to the secrets, no one removes the finalizer. Therefore, user needs to remove the `kubernetes.io/secret-protection` finalizer from all the secrets manually. + - If no `secret-protection-volumesnapshot-controller` exists but `snapshot.storage.kubernetes.io/secret-protection` finalizer is added to the secrets, no one removes the finalizer. Therefore, user needs to remove the `kubernetes.io/secret-protection` finalizer from all the secrets manually. ### Version Skew Strategy @@ -277,6 +277,8 @@ you need any help or guidance. - secret-protection-controller (part of kube-controller-manager) - storageobjectinuseprotection admission plugin (part of kube-controller-manager) +Secret protection for volume snapshot will be enabled when those relevant out-of-tree controllers are deployed, but no feature gate is needed. + ###### Does enabling the feature change any default behavior? Yes. Secrets aren't deleted until the resources using them are deleted. @@ -408,7 +410,7 @@ No. ###### Will enabling / using this feature result in increasing size or count of the existing API objects? - API type(s): Secret -- Estimated increase in size: the size of `kubernetes.io/secret-protection` finalizer, `snapshot.storage.kubernetes.io/secret-protection` finalizer, and `secret.kubernetes.io/disable-protection` annotation per secret +- Estimated increase in size: the size of `kubernetes.io/secret-protection` finalizer, `snapshot.storage.kubernetes.io/secret-protection` finalizer, and `secret.kubernetes.io/skip-secret-protection` annotation per secret - Estimated amount of new objects: N/A ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? From f68cf65f0f0a414a33645948a45a1a0679b8f489 Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Thu, 13 May 2021 21:42:03 +0000 Subject: [PATCH 05/12] Add details on Feature Enablement and Rollback --- .../sig-storage/2639-secret-protection/README.md | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/keps/sig-storage/2639-secret-protection/README.md b/keps/sig-storage/2639-secret-protection/README.md index 9959229a234..8f5d97f5087 100644 --- a/keps/sig-storage/2639-secret-protection/README.md +++ b/keps/sig-storage/2639-secret-protection/README.md @@ -153,7 +153,7 @@ It is due to the restriction that CRDs can't be handled from in-tree controller. - If the `SecretInUseProtection` feature gate is disalbed, finalizer is added on creation, but deleted regardless of whether it is in-use after `deletionTimestamp` is set. This will allow users to avoid manually deleting finalizers on the downgrading by disabling the feature. - Out-of-tree resources(`VolumeSnapshot`): - The deletion is blocked by using newly introduced `snapshot.storage.kubernetes.io/secret-protection` finalizer, - - The `snapshot.storage.kubernetes.io/secret-protection` finalizer will be always added on creation of the secret by using admission controller for `VolumeSnapshot`, + - The `snapshot.storage.kubernetes.io/secret-protection` finalizer will be always added on creation of the secret by using snapshot controller, - After the secret is requested to be deleted (`deletionTimestamp` is set), the `snapshot.storage.kubernetes.io/secret-protection` finalizer will be deleted by newly introduced out-of-tree `secret-protection-volumesnapshot-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. Feature gate `SecretInUseProtection` is used only for in-tree controller. Out-of-tree controller will be always enabled when the `secret-protection-volumesnapshot-controller` is deployed. @@ -277,6 +277,7 @@ you need any help or guidance. - secret-protection-controller (part of kube-controller-manager) - storageobjectinuseprotection admission plugin (part of kube-controller-manager) +Secret protection controllers for in-tree are deployed automatically, if the Kubernetes version is later than the version where secret protection feature is marked as alpha. Secret protection for volume snapshot will be enabled when those relevant out-of-tree controllers are deployed, but no feature gate is needed. ###### Does enabling the feature change any default behavior? @@ -285,7 +286,18 @@ Yes. Secrets aren't deleted until the resources using them are deleted. ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? -Yes, by disabling the feature gate. After the feature gate is disabled, users need to manually delete the `kubernetes.io/secret-protection` finalizer and the `snapshot.storage.kubernetes.io/secret-protection` finalizer to make secret deleted properly. +Yes. Detailed steps are as follows: + +- In-tree controller: + - Downgrade case: + 1. Downgrade to the version where no secret-protection-controller exists, + 2. Manually delete the `kubernetes.io/secret-protection` finalizer from all the secrets. + - Otherwise: + 1. Disable the `SecretInUseProtection` feature gate. + +- Out-of-tree controller: + 1. Delete the out-of-tree secret protection controller, + 2. Manually delete the `snapshot.storage.kubernetes.io/secret-protection` finalizer from all the secrets. ###### What happens if we reenable the feature if it was previously rolled back? From 7209bea5e10afdea52558dba99fc6c10104f3c5b Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Thu, 13 May 2021 22:32:28 +0000 Subject: [PATCH 06/12] Add a nob to disable out-of-tree controller --- .../2639-secret-protection/README.md | 21 +++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/keps/sig-storage/2639-secret-protection/README.md b/keps/sig-storage/2639-secret-protection/README.md index 8f5d97f5087..ef2a308b5ef 100644 --- a/keps/sig-storage/2639-secret-protection/README.md +++ b/keps/sig-storage/2639-secret-protection/README.md @@ -156,7 +156,7 @@ It is due to the restriction that CRDs can't be handled from in-tree controller. - The `snapshot.storage.kubernetes.io/secret-protection` finalizer will be always added on creation of the secret by using snapshot controller, - After the secret is requested to be deleted (`deletionTimestamp` is set), the `snapshot.storage.kubernetes.io/secret-protection` finalizer will be deleted by newly introduced out-of-tree `secret-protection-volumesnapshot-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. -Feature gate `SecretInUseProtection` is used only for in-tree controller. Out-of-tree controller will be always enabled when the `secret-protection-volumesnapshot-controller` is deployed. +Feature gate `SecretInUseProtection` is used only for in-tree controller. Out-of-tree controller will be enabled when the `secret-protection-volumesnapshot-controller` is deployed with `--secret-protection=enabled` flag. For users to force delete the secret, users need to do either: 1. manually delete the finalizer, by below command: @@ -180,12 +180,18 @@ Annotation will be more user friendly than directly deleting finalizer. However, - Verify that secret used by a CSI PV as controllerExpandSecret is not removed immediately - Verify that secret used by a CSI PV as provisionerSecret is not removed immediately - Verify that secret used by a CSI PV as provisionerSecret via template is not removed immediately - - Verify that secret used by a VolumeSnapshot is not removed immediately - - Verify that secret used by a VolumeSnapshot as snapshotterSecret via template is not removed immediately - SecretInUseProtection disabled: - Verify immediate deletion of a secret that is not used - Verify immediate deletion of a secret that is used by a Pod - Verify immediate deletion of a secret with finalizer that is used by a Pod + - `--secret-protection=enabled`: + - Verify immediate deletion of a secret that is not used + - Verify that secret used by a VolumeSnapshot is not removed immediately + - Verify that secret used by a VolumeSnapshot as snapshotterSecret via template is not removed immediately + - `--secret-protection=disabled`: + - Verify immediate deletion of a secret that is not used + - Verify immediate deletion of a secret used by a VolumeSnapshot + - Verify immediate deletion of a secret used by a VolumeSnapshot as snapshotterSecret via template - e2e tests: - Verify immediate deletion of a secret that is not used - Verify that secret used by a Pod is not removed immediately @@ -278,7 +284,7 @@ you need any help or guidance. - storageobjectinuseprotection admission plugin (part of kube-controller-manager) Secret protection controllers for in-tree are deployed automatically, if the Kubernetes version is later than the version where secret protection feature is marked as alpha. -Secret protection for volume snapshot will be enabled when those relevant out-of-tree controllers are deployed, but no feature gate is needed. +Secret protection for volume snapshot will be enabled when those relevant out-of-tree controllers are deployed with `--secret-protection=enabled`, but no feature gate is needed. ###### Does enabling the feature change any default behavior? @@ -296,8 +302,11 @@ Yes. Detailed steps are as follows: 1. Disable the `SecretInUseProtection` feature gate. - Out-of-tree controller: - 1. Delete the out-of-tree secret protection controller, - 2. Manually delete the `snapshot.storage.kubernetes.io/secret-protection` finalizer from all the secrets. + - Recommended method: + 1. Redeploy out-of-tree controllers with `--secret-protection=disabled` + - Alternative method: + 1. Delete the out-of-tree secret protection controller, + 2. Manually delete the `snapshot.storage.kubernetes.io/secret-protection` finalizer from all the secrets. ###### What happens if we reenable the feature if it was previously rolled back? From 09632c445ec6ed5ba43532d9f8135516cd4feec7 Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Fri, 14 May 2021 00:03:09 +0000 Subject: [PATCH 07/12] Add ConfigMap to the scope --- .../2639-secret-protection/README.md | 141 ++++++++++-------- .../2639-secret-protection/kep.yaml | 6 +- 2 files changed, 83 insertions(+), 64 deletions(-) diff --git a/keps/sig-storage/2639-secret-protection/README.md b/keps/sig-storage/2639-secret-protection/README.md index ef2a308b5ef..619a76f8a01 100644 --- a/keps/sig-storage/2639-secret-protection/README.md +++ b/keps/sig-storage/2639-secret-protection/README.md @@ -1,4 +1,4 @@ -# KEP-2639: Secret Protection +# KEP-2639: Secret/ConfigMap Protection - [Release Signoff Checklist](#release-signoff-checklist) @@ -55,14 +55,14 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary -This KEP proposes a feature to protect secrets while it is in use. Currently, user can delete a Secret that is being used by other resources, like Pods, PVs, and VolumeSnapshots. This may have negative impact on the resouces using the Secret and it may result in data loss. +This KEP proposes a feature to protect Secrets/ConfigMaps while it is in use. Currently, user can delete a secret that is being used by other resources, like Pods, PVs, and VolumeSnapshots. This may have negative impact on the resouces using the secret and it may result in data loss. Also, ConfigMaps can be deleted while it is in use, which may lead to unexpected behavior in applications. Similar features for protecting PV and PVC already exist as [pv-protection](https://github.com/kubernetes/enhancements/issues/499) and [pvc-protection](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/postpone-pvc-deletion-if-used-in-a-pod.md). ## Motivation -This feature aims to protect secrets from being deleted while they are in-use. +This feature aims to protect Secrets/ConfigMaps from being deleted while they are in-use. Secrets can be used by below ways: - From Pod: - [Mounted as data volumes](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-files-from-a-pod) @@ -81,25 +81,30 @@ Secrets can be used by below ways: - [From VolumeSnapshot](https://kubernetes-csi.github.io/docs/secrets-and-credentials-volume-snapshot-class.html): - snapshotter secret +ConfigMaps can be used by below ways: +- From Pod: + - [Mounted as files](https://kubernetes.io/docs/concepts/configuration/configmap/#using-configmaps-as-files-from-a-pod) + - [Exposed as environment variables](https://kubernetes.io/docs/concepts/configuration/configmap/#configmaps-and-pods) + ### Goals -- Protect secrets from being deleted while they are in use. +- Protect Secrets/ConfigMaps from being deleted while they are in use. ### Non-Goals -- Protect important secrets that aren't in use from being deleted -- Protect resources other than secrets from being deleted. +- Protect important Secrets/ConfigMaps that aren't in use from being deleted +- Protect resources other than Secrets/ConfigMaps from being deleted. ## Proposal -A new controller to protect secret is introduced. +New controllers to protect Secrets/ConfigMaps are introduced. ### User Stories #### Story 1 -A user creates a secret and a pod using the secret. Then, the user mistakenly delete the secret while the pod is using it. -The secret is protected until the pod using the secret is deleted. +A user creates Secrets/ConfigMaps and a pod using them. Then, the user mistakenly delete them while the pod is using them. +They are protected until the pod using them is deleted. #### Story 2 @@ -114,10 +119,10 @@ The user force delete the secret while it is used by other resources, and the se ### Notes/Constraints/Caveats (Optional) - Compatibility: - - There might be many existing scripts that don't care the order of deletion. Therefore, such scripts might stuck on secret deletion, if the deletion of the resources using the secrets are done later. + - There might be many existing scripts that don't care the order of deletion. Therefore, such scripts might stuck on Secret/ConfigMap deletion, if the deletion of the resources using the Secrets/ConfigMaps are done later. - Usability: - - Use of the secret in other resource will not be obvious to users. Therefore, users might not easily understand why the secret is not deleted. - - Users might need to force delete the secret on deletion or would like to avoid protection for certain secrets that already exist or that are newly created. + - Use of the Secret/ConfigMap in other resource will not be obvious to users. Therefore, users might not easily understand why the Secret/ConfigMap is not deleted. + - Users might need to force delete the Secret/ConfigMap on deletion or would like to avoid protection for certain Secrets/ConfigMaps that already exist or that are newly created. ### Risks and Mitigations @@ -144,34 +149,41 @@ This behavior is inconsistent with other secrets, like node publish secret whose This feature can be implemented in the same way as `pv-protection/pvc-protection`. Protection logics for in-tree resources and out-of-tree resources are separeted and independently work. -It is due to the restriction that CRDs can't be handled from in-tree controller. +It is due to the restriction that CRDs can't be handled from in-tree controller. Note that in-tree controller protect both Secrets and ConfigMaps, on the other hand, out-of-tree controller protect only Secrets because VoluemSnapshots never use ConfigMaps. - In-tree resources(`Pod` and `PersistentVolume`): - - The deletion is blocked by using newly introduced `kubernetes.io/secret-protection` finalizer, - - The `kubernetes.io/secret-protection` finalizer will be always added on creation of the secret by using admission controller for in-tree resources, - - After the secret is requested to be deleted (`deletionTimestamp` is set), the `kubernetes.io/secret-protection` finalizer will be deleted by newly introduced in-tree `secret-protection-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. - - If the `SecretInUseProtection` feature gate is disalbed, finalizer is added on creation, but deleted regardless of whether it is in-use after `deletionTimestamp` is set. This will allow users to avoid manually deleting finalizers on the downgrading by disabling the feature. + - The deletion is blocked by using newly introduced `kubernetes.io/ephemeral-protection` finalizer, + - The `kubernetes.io/ephemeral-protection` finalizer will be always added on creation of the Secret/ConfigMap by using admission controller for in-tree resources, + - After the Secret/ConfigMap is requested to be deleted (`deletionTimestamp` is set), the `kubernetes.io/ephemeral-protection` finalizer will be deleted by newly introduced in-tree `ephemeral-protection-controller` by checking whether the Secret/ConfigMap is in-use, on every change(Create/Update/Delete) events for Secrets/ConfigMaps and related resources. + - If the `EphemeralInUseProtection` feature gate is disalbed, finalizer is added on creation, but deleted regardless of whether it is in-use after `deletionTimestamp` is set. This will allow users to avoid manually deleting finalizers on the downgrading by disabling the feature. - Out-of-tree resources(`VolumeSnapshot`): - - The deletion is blocked by using newly introduced `snapshot.storage.kubernetes.io/secret-protection` finalizer, - - The `snapshot.storage.kubernetes.io/secret-protection` finalizer will be always added on creation of the secret by using snapshot controller, - - After the secret is requested to be deleted (`deletionTimestamp` is set), the `snapshot.storage.kubernetes.io/secret-protection` finalizer will be deleted by newly introduced out-of-tree `secret-protection-volumesnapshot-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. + - The deletion is blocked by using newly introduced `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer, + - The `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer will be always added on creation of the secret by using snapshot controller, + - After the secret is requested to be deleted (`deletionTimestamp` is set), the `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer will be deleted by newly introduced out-of-tree `ephemeral-protection-volumesnapshot-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. -Feature gate `SecretInUseProtection` is used only for in-tree controller. Out-of-tree controller will be enabled when the `secret-protection-volumesnapshot-controller` is deployed with `--secret-protection=enabled` flag. +Feature gate `EphemeralInUseProtection` is used only for in-tree controller. Out-of-tree controller will be enabled when the `ephemeral-protection-volumesnapshot-controller` is deployed with `--ephemeral-protection=enabled` flag. -For users to force delete the secret, users need to do either: +For users to force delete the Secret, users need to do either: 1. manually delete the finalizer, by below command: ``` kubectl patch secret/secret-to-be-deleted -p '{"metadata":{"finalizers":[]}}' --type=merge ``` -2. add `secret.kubernetes.io/skip-secret-protection: "yes"` annotation to opt-out this feature per secret +2. add `secret.kubernetes.io/skip-ephemeral-protection: "yes"` annotation to opt-out this feature per secret + +For users to force delete the ConfigMap, users need to do either: +1. manually delete the finalizer, by below command: + ``` + kubectl patch configmap/configmap-to-be-deleted -p '{"metadata":{"finalizers":[]}}' --type=merge + ``` +2. add `secret.kubernetes.io/skip-ephemeral-protection: "yes"` annotation to opt-out this feature per ConfigMap -Annotation will be more user friendly than directly deleting finalizer. However, it can't be used in the case that this feature is once enabled and disabled later by deleting the controller. For this case, it may be needed to provide a script to delete the finalizer on all the secrets. +Annotation will be more user friendly than directly deleting finalizer. However, it can't be used in the case that this feature is once enabled and disabled later by deleting the controller. For this case, it may be needed to provide a script to delete the finalizer on all the Secrets/ConfigMaps. ### Test Plan -- For Alpha, unit tests and e2e tests verifying that a secret used by other resources is protected by this feature are added. +- For Alpha, unit tests and e2e tests verifying that a Secret/ConfigMap used by other resources is protected by this feature are added. - unit tests: - - SecretInUseProtection enabled: + - EphemeralInUseProtection enabled: - Verify immediate deletion of a secret that is not used - Verify that secret used by a Pod is not removed immediately - Verify that secret used by a CSI PV as controllerPublishSecret is not removed immediately @@ -180,15 +192,20 @@ Annotation will be more user friendly than directly deleting finalizer. However, - Verify that secret used by a CSI PV as controllerExpandSecret is not removed immediately - Verify that secret used by a CSI PV as provisionerSecret is not removed immediately - Verify that secret used by a CSI PV as provisionerSecret via template is not removed immediately - - SecretInUseProtection disabled: + - Verify immediate deletion of a ConfigMap that is not used + - Verify that ConfigMap used by a Pod is not removed immediately + - EphemeralInUseProtection disabled: - Verify immediate deletion of a secret that is not used - Verify immediate deletion of a secret that is used by a Pod - Verify immediate deletion of a secret with finalizer that is used by a Pod - - `--secret-protection=enabled`: + - Verify immediate deletion of a ConfigMap that is not used + - Verify immediate deletion of a ConfigMap that is used by a Pod + - Verify immediate deletion of a ConfigMap with finalizer that is used by a Pod + - `--ephemeral-protection=enabled`: - Verify immediate deletion of a secret that is not used - Verify that secret used by a VolumeSnapshot is not removed immediately - Verify that secret used by a VolumeSnapshot as snapshotterSecret via template is not removed immediately - - `--secret-protection=disabled`: + - `--ephemeral-protection=disabled`: - Verify immediate deletion of a secret that is not used - Verify immediate deletion of a secret used by a VolumeSnapshot - Verify immediate deletion of a secret used by a VolumeSnapshot as snapshotterSecret via template @@ -203,6 +220,8 @@ Annotation will be more user friendly than directly deleting finalizer. However, - Verify that secret used by a CSI PV as provisionerSecret via template is not removed immediately - Verify that secret used by a VolumeSnapshot is not removed immediately - Verify that secret used by a VolumeSnapshot as snapshotterSecret via template is not removed immediately + - Verify immediate deletion of a ConfigMap that is not used + - Verify that ConfigMap used by a Pod is not removed immediately - For Beta, scalability tests are added to exercise this feature. - For GA, the introduced e2e tests will be promoted to conformance. @@ -233,21 +252,21 @@ enhancement: - What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade, in order to make use of the enhancement? --> -- Upgrade: Secret that doesn't have `kubernetes.io/secret-protection` finalizer and/or `snapshot.storage.kubernetes.io/secret-protection` finalizer will be added the finalizers on Update/Delete events, therefore no additional user operation will be needed. +- Upgrade: Secret/ConfigMap that doesn't have `kubernetes.io/ephemeral-protection` finalizer and/or `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer will be added the finalizers on Update/Delete events, therefore no additional user operation will be needed. - Downgrade: - Feature disabled case: - - If the `secret-protection-controller` exists and the feature is disabled, `kubernetes.io/secret-protection` finalizer will always be deleted, therefore no additional user operation will be needed, - - If the `secret-protection-volumesnapshot-controller` exists and the feature is disabled, `snapshot.storage.kubernetes.io/secret-protection` finalizer will always be deleted, therefore no additional user operation will be needed, + - If the `ephemeral-protection-controller` exists and the feature is disabled, `kubernetes.io/ephemeral-protection` finalizer will always be deleted, therefore no additional user operation will be needed, + - If the `ephemeral-protection-volumesnapshot-controller` exists and the feature is disabled, `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer will always be deleted, therefore no additional user operation will be needed, - Downgraded to no controller case: - - If no `secret-protection-controller` exists but `kubernetes.io/secret-protection` finalizer is added to the secrets, no one removes the finalizer. Therefore, user needs to remove the `kubernetes.io/secret-protection` finalizer from all the secrets manually. - - If no `secret-protection-volumesnapshot-controller` exists but `snapshot.storage.kubernetes.io/secret-protection` finalizer is added to the secrets, no one removes the finalizer. Therefore, user needs to remove the `kubernetes.io/secret-protection` finalizer from all the secrets manually. + - If no `ephemeral-protection-controller` exists but `kubernetes.io/ephemeral-protection` finalizer is added to the secrets, no one removes the finalizer. Therefore, user needs to remove the `kubernetes.io/ephemeral-protection` finalizer from all the secrets manually. + - If no `ephemeral-protection-volumesnapshot-controller` exists but `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer is added to the secrets, no one removes the finalizer. Therefore, user needs to remove the `kubernetes.io/ephemeral-protection` finalizer from all the Secrets/ConfigMaps manually. ### Version Skew Strategy - As for the difference between in-tree components and out-of-tree components, they work independently, so they won't affect each other, -- As for in-tree components, the protection logic involves only in-tree admission controller and in-tree secret-protection-controller, so version skew won't happen unless these components are available with different versions, -- As for out-of-tree components, the protection logic involves only out-of-tree admission controller and out-of-tree secret-protection-volumesnapshot-controller, so version skew won't happen unless these components are available with different versions, -- As for resources, CSI Volume and CSI Snapshot are involved, changes in the API/CRD of these resources especially for their secret fields might cause issue. Howerver, this should be compatibility issue for these API/CRDs. +- As for in-tree components, the protection logic involves only in-tree admission controller and in-tree ephemeral-protection-controller, so version skew won't happen unless these components are available with different versions, +- As for out-of-tree components, the protection logic involves only out-of-tree admission controller and out-of-tree ephemeral-protection-volumesnapshot-controller, so version skew won't happen unless these components are available with different versions, +- As for resources, CSI Volume and CSI Snapshot are involved, changes in the API/CRD of these resources especially for their Secret/ConfigMap fields might cause issue. Howerver, this should be compatibility issue for these API/CRDs. ## Production Readiness Review Questionnaire @@ -277,18 +296,18 @@ you need any help or guidance. ###### How can this feature be enabled / disabled in a live cluster? - [x] Feature gate (also fill in values in `kep.yaml`) - - Feature gate name: SecretInUseProtection + - Feature gate name: EphemeralInUseProtection - Components depending on the feature gate: - kube-controller-manager - - secret-protection-controller (part of kube-controller-manager) + - ephemeral-protection-controller (part of kube-controller-manager) - storageobjectinuseprotection admission plugin (part of kube-controller-manager) -Secret protection controllers for in-tree are deployed automatically, if the Kubernetes version is later than the version where secret protection feature is marked as alpha. -Secret protection for volume snapshot will be enabled when those relevant out-of-tree controllers are deployed with `--secret-protection=enabled`, but no feature gate is needed. +The ephemeral-protection-controller for in-tree are deployed automatically, if the Kubernetes version is later than the version where secret protection feature is marked as alpha. +Ephemeral protection for volume snapshot will be enabled when those relevant out-of-tree controllers are deployed with `--ephemeral-protection=enabled`, but no feature gate is needed. ###### Does enabling the feature change any default behavior? -Yes. Secrets aren't deleted until the resources using them are deleted. +Yes. Secrets/ConfigMaps aren't deleted until the resources using them are deleted. ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? @@ -296,25 +315,25 @@ Yes. Detailed steps are as follows: - In-tree controller: - Downgrade case: - 1. Downgrade to the version where no secret-protection-controller exists, - 2. Manually delete the `kubernetes.io/secret-protection` finalizer from all the secrets. + 1. Downgrade to the version where no ephemeral-protection-controller exists, + 2. Manually delete the `kubernetes.io/ephemeral-protection` finalizer from all the Secrets/ConfigMaps. - Otherwise: - 1. Disable the `SecretInUseProtection` feature gate. + 1. Disable the `EphemeralInUseProtection` feature gate. - Out-of-tree controller: - Recommended method: - 1. Redeploy out-of-tree controllers with `--secret-protection=disabled` + 1. Redeploy out-of-tree controllers with `--ephemeral-protection=disabled` - Alternative method: 1. Delete the out-of-tree secret protection controller, - 2. Manually delete the `snapshot.storage.kubernetes.io/secret-protection` finalizer from all the secrets. + 2. Manually delete the `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer from all the secrets. ###### What happens if we reenable the feature if it was previously rolled back? -Secrets aren't deleted until the resources using them are deleted, again. +Secrets/ConfigMaps aren't deleted until the resources using them are deleted, again. ###### Are there any tests for feature enablement/disablement? -Yes, unit tests for the secret-protection-controller and secret-protection-volumesnapshot-controller cover scenarios where the feature is disabled or enabled. +Yes, unit tests for the ephemeral-protection-controller and ephemeral-protection-volumesnapshot-controller cover scenarios where the feature is disabled or enabled. ### Rollout, Upgrade and Rollback Planning @@ -358,17 +377,17 @@ This section must be completed when targeting beta to a release. ###### How can an operator determine if the feature is in use by workloads? -There will be secrets which have `kubernetes.io/secret-protection` finalizer and `snapshot.storage.kubernetes.io/secret-protection` finalizer. +There will be Secrets/ConfigMaps which have `kubernetes.io/ephemeral-protection` finalizer and `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer. ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? - [x] Metrics - Metric name: secret_protection_controller - Aggregation method: prometheus - - Components exposing the metric: secret-protection-controller + - Components exposing the metric: ephemeral-protection-controller - Metric name: secret_protection_volumesnapshot_controller - Aggregation method: prometheus - - Components exposing the metric: secret-protection-volumesnapshot-controller + - Components exposing the metric: ephemeral-protection-volumesnapshot-controller ###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs? @@ -415,10 +434,10 @@ and creating new ones, as well as about cluster-level services (e.g. DNS): ### Scalability ###### Will enabling / using this feature result in any new API calls? -- API call type: Update Secret, List Pod/PV/VolumeSnapshot, and Get PVC/SC +- API call type: Update Secret/ConfigMap, List Pod/PV/VolumeSnapshot, and Get PVC/SC - estimated throughput: TBD -- originating component: secret-protection-controller and secret-protection-volumesnapshot-controller -- API calls are triggered by changes of secrets, Pod, PV, VolumeSnapshot +- originating component: ephemeral-protection-controller and ephemeral-protection-volumesnapshot-controller +- API calls are triggered by changes of Secret/ConfigMap, Pod, PV, VolumeSnapshot ###### Will enabling / using this feature result in introducing new API types? @@ -430,8 +449,8 @@ No. ###### Will enabling / using this feature result in increasing size or count of the existing API objects? -- API type(s): Secret -- Estimated increase in size: the size of `kubernetes.io/secret-protection` finalizer, `snapshot.storage.kubernetes.io/secret-protection` finalizer, and `secret.kubernetes.io/skip-secret-protection` annotation per secret +- API type(s): Secret/ConfigMap +- Estimated increase in size: the size of `kubernetes.io/ephemeral-protection` finalizer, `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer, and `secret.kubernetes.io/skip-ephemeral-protection` annotation per Secret/ConfigMap - Estimated amount of new objects: N/A ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? @@ -448,12 +467,12 @@ Think about adding additional work or introducing new steps in between ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? API: - - Size: Increase in size of the Secret is very limited. + - Size: Increase in size of the Secret/ConfigMap is very limited. - Number of calls: Rate limit is set for the number of API calls. Disk/IO: - No disk/IO are done through other than API calls and log outputs CPU/RAM: - - It works as common controller pattern. Therefore, number of resouces to process and the logic on how to detect in-use secret should only be needed to be checked. + - It works as common controller pattern. Therefore, number of resouces to process and the logic on how to detect in-use Secret/ConfigMap should only be needed to be checked. ### Troubleshooting @@ -505,8 +524,8 @@ Why should this KEP _not_ be implemented? ## Alternatives -- Manually adding/deleting finalizer to/from secrets that are not deleted automatically in certain life cycle +- Manually adding/deleting finalizer to/from Secrets/ConfigMaps that are not deleted automatically in certain life cycle - Introduce a new kind of reference, like usedReference, and leave addition/deletion of it to users (Similar to ownerReference, but just block deletion and won't try to delete referenced resources through GC, like deleting child on parent's deletion). -Above ways will force users to do some additional works to protect secrets. Also, they are inconsistent with pv-protection/pvc-protection concepts. +Above ways will force users to do some additional works to protect Secrets/ConfigMaps. Also, they are inconsistent with pv-protection/pvc-protection concepts. diff --git a/keps/sig-storage/2639-secret-protection/kep.yaml b/keps/sig-storage/2639-secret-protection/kep.yaml index b13026935ac..7b9da4add61 100644 --- a/keps/sig-storage/2639-secret-protection/kep.yaml +++ b/keps/sig-storage/2639-secret-protection/kep.yaml @@ -1,4 +1,4 @@ -title: secret protection +title: Secret/ConfigMap Protection kep-number: 2639 authors: - "@mkimuram" @@ -31,10 +31,10 @@ milestone: # The following PRR answers are required at alpha release # List the feature gate name and the components for which it must be enabled feature-gates: - - name: SecretInUseProtection + - name: EphemeralInUseProtection components: - kube-controller-manager - - secret-protection-controller (part of kube-controller-manager) + - ephemeral-protection-controller (part of kube-controller-manager) - storageobjectinuseprotection admission plugin (part of kube-controller-manager) disable-supported: true From f1557c09b6fcd3c973b2306dc5b4c6d00b354834 Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Tue, 17 Aug 2021 23:13:49 +0000 Subject: [PATCH 08/12] KEP-2639: Update to rely on Lien(KEP-2839) --- .../README.md | 252 +++++++++++------- .../kep.yaml | 14 +- 2 files changed, 167 insertions(+), 99 deletions(-) rename keps/sig-storage/{2639-secret-protection => 2639-secret-configmap-protection}/README.md (59%) rename keps/sig-storage/{2639-secret-protection => 2639-secret-configmap-protection}/kep.yaml (72%) diff --git a/keps/sig-storage/2639-secret-protection/README.md b/keps/sig-storage/2639-secret-configmap-protection/README.md similarity index 59% rename from keps/sig-storage/2639-secret-protection/README.md rename to keps/sig-storage/2639-secret-configmap-protection/README.md index 619a76f8a01..46aeef0952e 100644 --- a/keps/sig-storage/2639-secret-protection/README.md +++ b/keps/sig-storage/2639-secret-configmap-protection/README.md @@ -55,7 +55,15 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary -This KEP proposes a feature to protect Secrets/ConfigMaps while it is in use. Currently, user can delete a secret that is being used by other resources, like Pods, PVs, and VolumeSnapshots. This may have negative impact on the resouces using the secret and it may result in data loss. Also, ConfigMaps can be deleted while it is in use, which may lead to unexpected behavior in applications. +This KEP proposes a feature to protect Secrets/ConfigMaps while it is in use. Currently, users can delete a secret that is being used by other resources, like Pods, PVs, and VolumeSnapshots. This may have negative impact on the resouces using the secret and it may result in data loss. + +For example, one of the worst case scenarios is deletion of a Node Stage Secret for CSI volumes while it is still in-use. +If it happens, a CSI driver for the volume can't unstage the volume due to the lack of the Secret, +as a result, the volume can't be detached from the node and the volume can't even be deleted. +This issue will easily happen if a PVC for the volume and a Secret for the volume exist in the same namespace, and a user requests to delete the namespace, which starts deletion of all the resources in the namespace. +When the Secret is deleted before the PVC is deleted, this issue happens. + +Also, ConfigMaps can be deleted while it is in use, which may lead to an unexpected behavior in applications. Similar features for protecting PV and PVC already exist as [pv-protection](https://github.com/kubernetes/enhancements/issues/499) and [pvc-protection](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/postpone-pvc-deletion-if-used-in-a-pod.md). @@ -115,6 +123,7 @@ The secret is protected until the volume using the secret is deleted and the del A user really would like to delete a secret despite that it is used by other resources. The user force delete the secret while it is used by other resources, and the secret isn't protected and is actually deleted. +An example of such use cases is to update an immutable secret, by deleting and recreating. ### Notes/Constraints/Caveats (Optional) @@ -139,89 +148,166 @@ Consider including folks who also work outside the SIG or subproject. --> There is a corner case when Reclaim Policy is Retain and PV is referencing secrets in a certain user namespace. In such a case, when PVC is deleted, the PV referencing secrets remains as a resource not managed by the namespace, as a result, deletion of the secrets is blocked. -However, the user of the namespace won't notice why it is blocked and won't find a way to delete the secrets other than manually deleting finalizer. -In addition, to make things worse, provisioner secrets that are defined by using template with PVC information, +On the other hand, provisioner secrets that are defined by using templates with PVC information, like "${pvc.name}", would be deleted in above case, due to the failure in finding a reference to the secret, because the reference to provisioner secret is not stored in PV and needs to be resolved on deletion time with PVC information that is already deleted. This behavior is inconsistent with other secrets, like node publish secret whose reference is stored in PV. -## Design Details +To avoid this kind of case, an [issue](https://github.com/kubernetes-csi/external-provisioner/issues/654) for adding information on secrets in PersistentVolume as annotations is in progress. This change will allow protecting provisioner secrets from deletion even if PVC or SC for PV is deleted. -This feature can be implemented in the same way as `pv-protection/pvc-protection`. -Protection logics for in-tree resources and out-of-tree resources are separeted and independently work. -It is due to the restriction that CRDs can't be handled from in-tree controller. Note that in-tree controller protect both Secrets and ConfigMaps, on the other hand, out-of-tree controller protect only Secrets because VoluemSnapshots never use ConfigMaps. +## Design Details -- In-tree resources(`Pod` and `PersistentVolume`): - - The deletion is blocked by using newly introduced `kubernetes.io/ephemeral-protection` finalizer, - - The `kubernetes.io/ephemeral-protection` finalizer will be always added on creation of the Secret/ConfigMap by using admission controller for in-tree resources, - - After the Secret/ConfigMap is requested to be deleted (`deletionTimestamp` is set), the `kubernetes.io/ephemeral-protection` finalizer will be deleted by newly introduced in-tree `ephemeral-protection-controller` by checking whether the Secret/ConfigMap is in-use, on every change(Create/Update/Delete) events for Secrets/ConfigMaps and related resources. - - If the `EphemeralInUseProtection` feature gate is disalbed, finalizer is added on creation, but deleted regardless of whether it is in-use after `deletionTimestamp` is set. This will allow users to avoid manually deleting finalizers on the downgrading by disabling the feature. -- Out-of-tree resources(`VolumeSnapshot`): - - The deletion is blocked by using newly introduced `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer, - - The `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer will be always added on creation of the secret by using snapshot controller, - - After the secret is requested to be deleted (`deletionTimestamp` is set), the `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer will be deleted by newly introduced out-of-tree `ephemeral-protection-volumesnapshot-controller` by checking whether the secret is in-use, on every change(Create/Update/Delete) events for secrets and related resources. +This feature can be implemented by using Lien which is being discussed in [KEP-2839](https://github.com/kubernetes/enhancements/pull/2840). +A newly introduced Secret protection controller and a newly introduced ConfigMap protection controller update the `Liens` field of Secrets and ConfigMaps when it finds a reference to them from other resources that the controllers are watching. + +Note that that Secret protection controller needs to handle `VolumeSnapshotContent`, which is out-of-tree resource. +Due to the restriction of out-of-tree resources, it needs to be handled by an external controller. +Therefore, it is designed to make the same external controller handle both in-tree resources and out-of-tree resources, instead of creating both an in-tree controller and an external controller, separately. +Also, ConfigMap protection controller is implemented as an external controller. + +To minimize the risk of unintentionally blocking the deletion, this feature is opt-in. +Users need to add below to enable this feature per resource: +- `kubernetes.io/enable-secret-protection` annotation to Secret +- `kubernetes.io/enable-configmap-protection` annotation to ConfigMap + +Basic flows of Secret protection controller and ConfigMap protection controller are as follows: + +- Secret protection controller: + - The controller watches `Pod`, `PersistentVolume`, and `VolumeSnapshotContent` + - Create/Update event: + - For all the `Secret`s referenced by the resource: + - If the `Secret` doesn't have `kubernetes.io/enable-secret-protection: "yes"` annotation, do nothing + - Update the `Secret`'s `Liens` field to add `kubernetes.io/secret-protection ${version}/${kind}/${namespace}/${name}`. + - Delete event: + - For all the `Secret`s referenced by the resource: + - Update the `Secret`'s `Liens` field to remove `kubernetes.io/secret-protection ${version}/${kind}/${namespace}/${name}`. + - The controller periodically check `Pod`, `PersistentVolume`, and `VolumeSnapshotContent` + - For each resource: + - For all the `Secret`s referenced by the resource: + - If the `Secret` doesn't have `kubernetes.io/enable-secret-protection: "yes"` annotation, do nothing + - Update the `Secret`'s `Liens` field to add `kubernetes.io/secret-protection ${version}/${kind}/${namespace}/${name}`. + - The controller preiodically check `Secret` + - For each `Secret`: + - If the `Secret` doesn't have `kubernetes.io/enable-secret-protection: "yes"` annotation + - Remove all the `Liens` that is prefixed with `kubernetes.io/secret-protection` + - otherwise: + - For all the liens that are prefixed with `kubernetes.io/secret-protection`: + - Check if the resource with `${version}/${kind}/${namespace}/${name}` exists, and if it no longer exists, delete the lien. + +- ConfigMap protection controller: + - The controller watches `Pod` + - Create/Update event: + - For all the `ConfigMap`s referenced by the `Pod`: + - If the `ConfigMap` doesn't have `kubernetes.io/enable-configmap-protection: "yes"` annotation, do nothing + - Update the `ConfigMap`'s `Liens` field to add `kubernetes.io/configmap-protection ${version}/${kind}/${namespace}/${name}`. + - Delete event: + - For all the `ConfigMap`s referenced by the `Pod`: + - Update the `ConfigMap`'s `Liens` field to remove `kubernetes.io/configmap-protection ${version}/${kind}/${namespace}/${name}`. + - The controller periodically check `Pod` + - For each `Pod`: + - For all the `ConfigMap`s referenced by the `Pod`: + - If the `ConfigMap` doesn't have `kubernetes.io/enable-configmap-protection: "yes"` annotation, do nothing + - Update the `ConfigMap`'s `Liens` field to add `kubernetes.io/configmap-protection ${version}/${kind}/${namespace}/${name}`. + - The controller preiodically check the `ConfigMap`s + - For each `ConfigMap`: + - If the `ConfigMap` doesn't have `kubernetes.io/enable-configmap-protection: "yes"` annotation + - Remove all the `Liens` that is prefixed with `kubernetes.io/configmap-protection` + - otherwise: + - For all the liens that are prefixed with `kubernetes.io/configmap-protection`: + - Check if the resource with `${version}/${kind}/${namespace}/${name}` exists, and if it no longer exists, delete the lien. + +Note that + - Periodical checks for referencing resources are needed for handling a case that the annotation is added after referecing resources are created, + - Periodical checks for `Secret` or `Configmap` are needed for handling a case that update for a referencing resource is done in a way that remove the reference to `Secret` or `Configmap`, + - What deletion event handling is doing is covered by the periodical checks for `Secret` or `Configmap`, but added intentionally to handle the event in a timely manner. + +Prototype implementation can be found [here](https://github.com/mkimuram/secret-protection/commits/lien). + +For users to force delete the Secret, users need to either: +1. Remove `kubernetes.io/enable-secret-protection: "yes"` annotation + ``` + kubectl annotate secret secret-to-be-deleted kubernetes.io/enable-secret-protection- + ``` -Feature gate `EphemeralInUseProtection` is used only for in-tree controller. Out-of-tree controller will be enabled when the `ephemeral-protection-volumesnapshot-controller` is deployed with `--ephemeral-protection=enabled` flag. +2. Remove the Liens, by below command: + ``` + kubectl patch secret secret-to-be-deleted -p '{"metadata":{"liens":[]}}' --type=merge + ``` -For users to force delete the Secret, users need to do either: -1. manually delete the finalizer, by below command: +For users to force delete the ConfigMap, users need to either: +1. Remove `kubernetes.io/enable-configmap-protection: "yes"` annotation ``` - kubectl patch secret/secret-to-be-deleted -p '{"metadata":{"finalizers":[]}}' --type=merge + kubectl annotate configmap configmap-to-be-deleted kubernetes.io/enable-configmap-protection- ``` -2. add `secret.kubernetes.io/skip-ephemeral-protection: "yes"` annotation to opt-out this feature per secret -For users to force delete the ConfigMap, users need to do either: -1. manually delete the finalizer, by below command: +2. Remove the Liens, by below command: ``` - kubectl patch configmap/configmap-to-be-deleted -p '{"metadata":{"finalizers":[]}}' --type=merge + kubectl patch configmap configmap-to-be-deleted -p '{"metadata":{"liens":[]}}' --type=merge ``` -2. add `secret.kubernetes.io/skip-ephemeral-protection: "yes"` annotation to opt-out this feature per ConfigMap -Annotation will be more user friendly than directly deleting finalizer. However, it can't be used in the case that this feature is once enabled and disabled later by deleting the controller. For this case, it may be needed to provide a script to delete the finalizer on all the Secrets/ConfigMaps. +Annotation will be more user friendly than directly deleting liens. +However, annotation can't be used in the case that controllers are once deployed and undeployed later. +On the other hand, directly deleting liens might not work, because liens may be added by the controllers again, even after users manually remove the liens. +Therefore, directly deleting liens should be used after the controllers are undelpoyed. +For this case, it may be needed to provide a script to delete all the related liens on all the Secrets/ConfigMaps. + +Note that above examples of removing liens remove all the liens from the resource. +So, when running these commands, we need to care that other controllers or users may have been add another Liens for other purposes (For the case _to force delete_, it won't matter because the user really would like to delete the resource anyway). + ### Test Plan - For Alpha, unit tests and e2e tests verifying that a Secret/ConfigMap used by other resources is protected by this feature are added. - unit tests: - - EphemeralInUseProtection enabled: + - Lien feature enabled: - Verify immediate deletion of a secret that is not used - - Verify that secret used by a Pod is not removed immediately + - Verify immediate deletion of a secret that is used but doesn't have annotation + - Verify that secret used by a Pod as volume is not removed immediately + - Verify that secret used by a Pod as EnvVar is not removed immediately - Verify that secret used by a CSI PV as controllerPublishSecret is not removed immediately - Verify that secret used by a CSI PV as nodeStageSecret is not removed immediately - Verify that secret used by a CSI PV as nodePublishSecret is not removed immediately - Verify that secret used by a CSI PV as controllerExpandSecret is not removed immediately - Verify that secret used by a CSI PV as provisionerSecret is not removed immediately - Verify that secret used by a CSI PV as provisionerSecret via template is not removed immediately + - Verify that secret used by a VolumeSnapshot is not removed immediately + - Verify that secret used by a VolumeSnapshot as snapshotterSecret via template is not removed immediately - Verify immediate deletion of a ConfigMap that is not used - - Verify that ConfigMap used by a Pod is not removed immediately - - EphemeralInUseProtection disabled: + - Verify immediate deletion of a ConfigMap that is used but doesn't have annotation + - Verify that ConfigMap used by a Pod as volume is not removed immediately + - Verify that ConfigMap used by a Pod as EnvVar is not removed immediately + - Lien feature disabled: - Verify immediate deletion of a secret that is not used - Verify immediate deletion of a secret that is used by a Pod - - Verify immediate deletion of a secret with finalizer that is used by a Pod - Verify immediate deletion of a ConfigMap that is not used - Verify immediate deletion of a ConfigMap that is used by a Pod - - Verify immediate deletion of a ConfigMap with finalizer that is used by a Pod - - `--ephemeral-protection=enabled`: - - Verify immediate deletion of a secret that is not used - - Verify that secret used by a VolumeSnapshot is not removed immediately - - Verify that secret used by a VolumeSnapshot as snapshotterSecret via template is not removed immediately - - `--ephemeral-protection=disabled`: - Verify immediate deletion of a secret that is not used - Verify immediate deletion of a secret used by a VolumeSnapshot - Verify immediate deletion of a secret used by a VolumeSnapshot as snapshotterSecret via template - e2e tests: - - Verify immediate deletion of a secret that is not used - - Verify that secret used by a Pod is not removed immediately - - Verify that secret used by a CSI PV as controllerPublishSecret is not removed immediately - - Verify that secret used by a CSI PV as nodeStageSecret is not removed immediately - - Verify that secret used by a CSI PV as nodePublishSecret is not removed immediately - - Verify that secret used by a CSI PV as controllerExpandSecret is not removed immediately - - Verify that secret used by a CSI PV as provisionerSecret is not removed immediately - - Verify that secret used by a CSI PV as provisionerSecret via template is not removed immediately - - Verify that secret used by a VolumeSnapshot is not removed immediately - - Verify that secret used by a VolumeSnapshot as snapshotterSecret via template is not removed immediately - - Verify immediate deletion of a ConfigMap that is not used - - Verify that ConfigMap used by a Pod is not removed immediately + - secret-protection-controller deployed: + - Verify immediate deletion of a secret that is not used + - Verify immediate deletion of a secret that is used but doesn't have annotation + - Verify that secret used by a Pod as volume is not removed immediately + - Verify that secret used by a Pod as EnvVar is not removed immediately + - Verify that secret used by a CSI PV as controllerPublishSecret is not removed immediately + - Verify that secret used by a CSI PV as nodeStageSecret is not removed immediately + - Verify that secret used by a CSI PV as nodePublishSecret is not removed immediately + - Verify that secret used by a CSI PV as controllerExpandSecret is not removed immediately + - Verify that secret used by a CSI PV as provisionerSecret is not removed immediately + - Verify that secret used by a CSI PV as provisionerSecret via template is not removed immediately + - Verify that secret used by a VolumeSnapshot is not removed immediately + - Verify that secret used by a VolumeSnapshot as snapshotterSecret via template is not removed immediately + - secret-protection-controller not deployed: + - Verify immediate deletion of a secret that is used by Pod, PV, and VolumeSnapshot + - configmap-protection-controller deployed: + - Verify immediate deletion of a ConfigMap that is not used + - Verify immediate deletion of a ConfigMap that is used but doesn't have annotation + - Verify that ConfigMap used by a Pod as volume is not removed immediately + - Verify that ConfigMap used by a Pod as EnvVar is not removed immediately + - config-protection-controller not deployed: + - Verify immediate deletion of a ConfigMap that is used by Pod + - For Beta, scalability tests are added to exercise this feature. - For GA, the introduced e2e tests will be promoted to conformance. @@ -252,20 +338,18 @@ enhancement: - What changes (in invocations, configurations, API use, etc.) is an existing cluster required to make on upgrade, in order to make use of the enhancement? --> -- Upgrade: Secret/ConfigMap that doesn't have `kubernetes.io/ephemeral-protection` finalizer and/or `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer will be added the finalizers on Update/Delete events, therefore no additional user operation will be needed. +- Upgrade: + - This feature requires that lien(InUseProtection feature gate) is enabled in the cluster. + - To enable Secret protection, deploy secret-protection-controller. + - To enable ConfigMap protection, deploy configmap-protection-controller. - Downgrade: - - Feature disabled case: - - If the `ephemeral-protection-controller` exists and the feature is disabled, `kubernetes.io/ephemeral-protection` finalizer will always be deleted, therefore no additional user operation will be needed, - - If the `ephemeral-protection-volumesnapshot-controller` exists and the feature is disabled, `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer will always be deleted, therefore no additional user operation will be needed, - - Downgraded to no controller case: - - If no `ephemeral-protection-controller` exists but `kubernetes.io/ephemeral-protection` finalizer is added to the secrets, no one removes the finalizer. Therefore, user needs to remove the `kubernetes.io/ephemeral-protection` finalizer from all the secrets manually. - - If no `ephemeral-protection-volumesnapshot-controller` exists but `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer is added to the secrets, no one removes the finalizer. Therefore, user needs to remove the `kubernetes.io/ephemeral-protection` finalizer from all the Secrets/ConfigMaps manually. + - To disable Secret protection, undeploy secret-protection-controller. + - To disable ConfigMap protection, undeploy configmap-protection-controller. + - If the cluster is downgraded to the version that doesn't support lien, this feature should be disabled, or controllers should be undeployed. ### Version Skew Strategy -- As for the difference between in-tree components and out-of-tree components, they work independently, so they won't affect each other, -- As for in-tree components, the protection logic involves only in-tree admission controller and in-tree ephemeral-protection-controller, so version skew won't happen unless these components are available with different versions, -- As for out-of-tree components, the protection logic involves only out-of-tree admission controller and out-of-tree ephemeral-protection-volumesnapshot-controller, so version skew won't happen unless these components are available with different versions, +- As for dependency on features, this feature depends on lien (InUseProtection feature gate), therefore, if the lien feature is disabled, this feature should also be disabled, - As for resources, CSI Volume and CSI Snapshot are involved, changes in the API/CRD of these resources especially for their Secret/ConfigMap fields might cause issue. Howerver, this should be compatibility issue for these API/CRDs. ## Production Readiness Review Questionnaire @@ -295,15 +379,14 @@ you need any help or guidance. ### Feature Enablement and Rollback ###### How can this feature be enabled / disabled in a live cluster? -- [x] Feature gate (also fill in values in `kep.yaml`) - - Feature gate name: EphemeralInUseProtection - - Components depending on the feature gate: - - kube-controller-manager - - ephemeral-protection-controller (part of kube-controller-manager) - - storageobjectinuseprotection admission plugin (part of kube-controller-manager) - -The ephemeral-protection-controller for in-tree are deployed automatically, if the Kubernetes version is later than the version where secret protection feature is marked as alpha. -Ephemeral protection for volume snapshot will be enabled when those relevant out-of-tree controllers are deployed with `--ephemeral-protection=enabled`, but no feature gate is needed. +- [x] Other + - Describe the mechanism: Deploy the controller for each feature: + - secret-protection-controller + - configmap-protection-controller + - Will enabling / disabling the feature require downtime of the control + plane? No. + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). No. ###### Does enabling the feature change any default behavior? @@ -311,21 +394,7 @@ Yes. Secrets/ConfigMaps aren't deleted until the resources using them are delete ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? -Yes. Detailed steps are as follows: - -- In-tree controller: - - Downgrade case: - 1. Downgrade to the version where no ephemeral-protection-controller exists, - 2. Manually delete the `kubernetes.io/ephemeral-protection` finalizer from all the Secrets/ConfigMaps. - - Otherwise: - 1. Disable the `EphemeralInUseProtection` feature gate. - -- Out-of-tree controller: - - Recommended method: - 1. Redeploy out-of-tree controllers with `--ephemeral-protection=disabled` - - Alternative method: - 1. Delete the out-of-tree secret protection controller, - 2. Manually delete the `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer from all the secrets. +Yes. By undeploying the controllers. ###### What happens if we reenable the feature if it was previously rolled back? @@ -333,7 +402,7 @@ Secrets/ConfigMaps aren't deleted until the resources using them are deleted, ag ###### Are there any tests for feature enablement/disablement? -Yes, unit tests for the ephemeral-protection-controller and ephemeral-protection-volumesnapshot-controller cover scenarios where the feature is disabled or enabled. +Yes, e2e tests for secret-protection-controller and configmap-protection-controller cover scenarios where the controller is deployed and undeployed. ### Rollout, Upgrade and Rollback Planning @@ -377,17 +446,17 @@ This section must be completed when targeting beta to a release. ###### How can an operator determine if the feature is in use by workloads? -There will be Secrets/ConfigMaps which have `kubernetes.io/ephemeral-protection` finalizer and `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer. +There will be Secrets/ConfigMaps which have liens prefixed with `kubernetes.io/secret-protection` or `kubernetes.io/configmap-protection`. ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? - [x] Metrics - Metric name: secret_protection_controller - Aggregation method: prometheus - - Components exposing the metric: ephemeral-protection-controller - - Metric name: secret_protection_volumesnapshot_controller + - Components exposing the metric: secret-protection-controller + - Metric name: configmap_protection_controller - Aggregation method: prometheus - - Components exposing the metric: ephemeral-protection-volumesnapshot-controller + - Components exposing the metric: configmap-protection-controller ###### What are the reasonable SLOs (Service Level Objectives) for the above SLIs? @@ -436,7 +505,7 @@ and creating new ones, as well as about cluster-level services (e.g. DNS): - API call type: Update Secret/ConfigMap, List Pod/PV/VolumeSnapshot, and Get PVC/SC - estimated throughput: TBD -- originating component: ephemeral-protection-controller and ephemeral-protection-volumesnapshot-controller +- originating component: secret-protection-controller and configmap-protection-controller - API calls are triggered by changes of Secret/ConfigMap, Pod, PV, VolumeSnapshot ###### Will enabling / using this feature result in introducing new API types? @@ -450,7 +519,7 @@ No. ###### Will enabling / using this feature result in increasing size or count of the existing API objects? - API type(s): Secret/ConfigMap -- Estimated increase in size: the size of `kubernetes.io/ephemeral-protection` finalizer, `snapshot.storage.kubernetes.io/ephemeral-protection` finalizer, and `secret.kubernetes.io/skip-ephemeral-protection` annotation per Secret/ConfigMap +- Estimated increase in size: the size of Liens field per Secret/ConfigMap - Estimated amount of new objects: N/A ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? @@ -525,7 +594,8 @@ Why should this KEP _not_ be implemented? ## Alternatives - Manually adding/deleting finalizer to/from Secrets/ConfigMaps that are not deleted automatically in certain life cycle +- Implement in the same way as [pv-protection](https://github.com/kubernetes/enhancements/issues/499) and [pvc-protection](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/postpone-pvc-deletion-if-used-in-a-pod.md) - Introduce a new kind of reference, like usedReference, and leave addition/deletion of it to users (Similar to ownerReference, but just block deletion and won't try to delete referenced resources through GC, like deleting child on parent's deletion). -Above ways will force users to do some additional works to protect Secrets/ConfigMaps. Also, they are inconsistent with pv-protection/pvc-protection concepts. +Above ways will force users to do some additional works to protect Secrets/ConfigMaps. diff --git a/keps/sig-storage/2639-secret-protection/kep.yaml b/keps/sig-storage/2639-secret-configmap-protection/kep.yaml similarity index 72% rename from keps/sig-storage/2639-secret-protection/kep.yaml rename to keps/sig-storage/2639-secret-configmap-protection/kep.yaml index 7b9da4add61..2812905d466 100644 --- a/keps/sig-storage/2639-secret-protection/kep.yaml +++ b/keps/sig-storage/2639-secret-configmap-protection/kep.yaml @@ -20,21 +20,19 @@ stage: alpha # The most recent milestone for which work toward delivery of this KEP has been # done. This can be the current (upcoming) milestone, if it is being actively # worked on. -latest-milestone: "v1.22" +latest-milestone: "v1.23" # The milestone at which this feature was, or is targeted to be, at each stage. milestone: - alpha: "v1.22" - beta: "v1.23" - stable: "v1.24" + alpha: "v1.23" + beta: "v1.24" + stable: "v1.26" # The following PRR answers are required at alpha release # List the feature gate name and the components for which it must be enabled feature-gates: - - name: EphemeralInUseProtection + - name: InUseProtection components: - - kube-controller-manager - - ephemeral-protection-controller (part of kube-controller-manager) - - storageobjectinuseprotection admission plugin (part of kube-controller-manager) + - kube-apiserver disable-supported: true From e6e7736ada2f734573532ea68b4335bbdfb4b934 Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Wed, 1 Sep 2021 18:22:55 +0000 Subject: [PATCH 09/12] KEP-2639: Addressed review comments --- .../README.md | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/keps/sig-storage/2639-secret-configmap-protection/README.md b/keps/sig-storage/2639-secret-configmap-protection/README.md index 46aeef0952e..1edcda9d855 100644 --- a/keps/sig-storage/2639-secret-configmap-protection/README.md +++ b/keps/sig-storage/2639-secret-configmap-protection/README.md @@ -77,6 +77,14 @@ Secrets can be used by below ways: - [Exposed as environment variables](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables) - [Generic ephemeral volumes ](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes) (can be handled as CSI PV below) +- From Deployment: + - Specified through [DeploymentSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#deploymentspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) +- From StatefulSet: + - Specified through [StatefulSetSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#statefulsetspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) +- From DaemonSet: + - Specified through [DaemonSetSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#daemonsetspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) +- From CronJob: + - Specified through [JobTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#cronjob-v1-batch) -> [JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#jobspec-v1-batch) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) - From PV: - [CSI](https://kubernetes-csi.github.io/docs/secrets-and-credentials-storage-class.html): - provisioner secret @@ -93,6 +101,14 @@ ConfigMaps can be used by below ways: - From Pod: - [Mounted as files](https://kubernetes.io/docs/concepts/configuration/configmap/#using-configmaps-as-files-from-a-pod) - [Exposed as environment variables](https://kubernetes.io/docs/concepts/configuration/configmap/#configmaps-and-pods) +- From Deployment: + - Specified through [DeploymentSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#deploymentspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) +- From StatefulSet: + - Specified through [StatefulSetSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#statefulsetspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) +- From DaemonSet: + - Specified through [DaemonSetSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#daemonsetspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) +- From CronJob: + - Specified through [JobTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#cronjob-v1-batch) -> [JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#jobspec-v1-batch) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) ### Goals @@ -101,6 +117,7 @@ ConfigMaps can be used by below ways: ### Non-Goals - Protect important Secrets/ConfigMaps that aren't in use from being deleted +- Protect Secrets/ConfigMaps from being __updated__ while they are in use (Immutable Secrets/ConfigMaps will solve it) - Protect resources other than Secrets/ConfigMaps from being deleted. ## Proposal @@ -264,6 +281,8 @@ So, when running these commands, we need to care that other controllers or users - Verify immediate deletion of a secret that is used but doesn't have annotation - Verify that secret used by a Pod as volume is not removed immediately - Verify that secret used by a Pod as EnvVar is not removed immediately + - Verify that secret used by a Deployment, a StatefulSet, a Daemonset, and a CronJob as volume has proper Liens + - Verify that secret used by a Deployment, a StatefulSet, a Daemonset, and a CronJob as EnvVar has proper Liens - Verify that secret used by a CSI PV as controllerPublishSecret is not removed immediately - Verify that secret used by a CSI PV as nodeStageSecret is not removed immediately - Verify that secret used by a CSI PV as nodePublishSecret is not removed immediately @@ -276,6 +295,8 @@ So, when running these commands, we need to care that other controllers or users - Verify immediate deletion of a ConfigMap that is used but doesn't have annotation - Verify that ConfigMap used by a Pod as volume is not removed immediately - Verify that ConfigMap used by a Pod as EnvVar is not removed immediately + - Verify that ConfigMap used by a Deployment, a StatefulSet, a Daemonset, and a CronJob as volume has proper Liens + - Verify that ConfigMap used by a Deployment, a StatefulSet, a Daemonset, and a CronJob as EnvVar has proper Liens - Lien feature disabled: - Verify immediate deletion of a secret that is not used - Verify immediate deletion of a secret that is used by a Pod @@ -290,6 +311,8 @@ So, when running these commands, we need to care that other controllers or users - Verify immediate deletion of a secret that is used but doesn't have annotation - Verify that secret used by a Pod as volume is not removed immediately - Verify that secret used by a Pod as EnvVar is not removed immediately + - Verify that secret used by a Deployment, a StatefulSet, a Daemonset, and a CronJob as volume has proper Liens + - Verify that secret used by a Deployment, a StatefulSet, a Daemonset, and a CronJob as EnvVar has proper Liens - Verify that secret used by a CSI PV as controllerPublishSecret is not removed immediately - Verify that secret used by a CSI PV as nodeStageSecret is not removed immediately - Verify that secret used by a CSI PV as nodePublishSecret is not removed immediately @@ -305,6 +328,8 @@ So, when running these commands, we need to care that other controllers or users - Verify immediate deletion of a ConfigMap that is used but doesn't have annotation - Verify that ConfigMap used by a Pod as volume is not removed immediately - Verify that ConfigMap used by a Pod as EnvVar is not removed immediately + - Verify that ConfigMap used by a Deployment, a StatefulSet, a Daemonset, and a CronJob as volume has proper Liens + - Verify that ConfigMap used by a Deployment, a StatefulSet, a Daemonset, and a CronJob as EnvVar has proper Liens - config-protection-controller not deployed: - Verify immediate deletion of a ConfigMap that is used by Pod From cbc3a6771ddb21dd7ab07116bf12f008ce845808 Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Thu, 2 Sep 2021 15:33:09 +0000 Subject: [PATCH 10/12] KEP-2639: Addressed review comments (Add details in PRR sections) --- .../2639-secret-configmap-protection/README.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/keps/sig-storage/2639-secret-configmap-protection/README.md b/keps/sig-storage/2639-secret-configmap-protection/README.md index 1edcda9d855..b38e82d2b6c 100644 --- a/keps/sig-storage/2639-secret-configmap-protection/README.md +++ b/keps/sig-storage/2639-secret-configmap-protection/README.md @@ -405,9 +405,14 @@ you need any help or guidance. ###### How can this feature be enabled / disabled in a live cluster? - [x] Other - - Describe the mechanism: Deploy the controller for each feature: - - secret-protection-controller - - configmap-protection-controller + - Describe the mechanism: + - Prerequisite: InUseProtection feature gate is enabled. + - To enable: + - Deploy secret-protection-controller to enable Secret protection + - Deploy configmap-protection-controller to enable Configmap protection + - To disable: + - Undeploy secret-protection-controller to disable Secret protection + - Undeploy configmap-protection-controller to disable Configmap protection - Will enabling / disabling the feature require downtime of the control plane? No. - Will enabling / disabling the feature require downtime or reprovisioning @@ -427,7 +432,10 @@ Secrets/ConfigMaps aren't deleted until the resources using them are deleted, ag ###### Are there any tests for feature enablement/disablement? -Yes, e2e tests for secret-protection-controller and configmap-protection-controller cover scenarios where the controller is deployed and undeployed. +Yes. + +- Unit tests cover scenarios for lien feature (InUseProtection feature gate) disabled case. +- E2e tests for secret-protection-controller and configmap-protection-controller cover scenarios where the controller is deployed and undeployed. ### Rollout, Upgrade and Rollback Planning From 20d34403c30736e8852a53d762a0d905f58feef1 Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Tue, 7 Sep 2021 23:32:23 +0000 Subject: [PATCH 11/12] KEP-2639: Update to use the per-controller version of Lien --- keps/prod-readiness/sig-storage/2639.yaml | 2 +- .../README.md | 80 ++++++++++++------- 2 files changed, 50 insertions(+), 32 deletions(-) diff --git a/keps/prod-readiness/sig-storage/2639.yaml b/keps/prod-readiness/sig-storage/2639.yaml index d38e44af2c4..a923eda52bd 100644 --- a/keps/prod-readiness/sig-storage/2639.yaml +++ b/keps/prod-readiness/sig-storage/2639.yaml @@ -1,3 +1,3 @@ kep-number: 2639 alpha: - approver: "@johnbelamaric" + approver: "@wojtek-t" diff --git a/keps/sig-storage/2639-secret-configmap-protection/README.md b/keps/sig-storage/2639-secret-configmap-protection/README.md index b38e82d2b6c..b950cef1a13 100644 --- a/keps/sig-storage/2639-secret-configmap-protection/README.md +++ b/keps/sig-storage/2639-secret-configmap-protection/README.md @@ -119,6 +119,7 @@ ConfigMaps can be used by below ways: - Protect important Secrets/ConfigMaps that aren't in use from being deleted - Protect Secrets/ConfigMaps from being __updated__ while they are in use (Immutable Secrets/ConfigMaps will solve it) - Protect resources other than Secrets/ConfigMaps from being deleted. +- Provide a generic mechanism that prevents a resource from being deleted when it shouldn't be deleted (Scope of [KEP-2839](https://github.com/kubernetes/enhancements/pull/2840)) ## Proposal @@ -190,48 +191,64 @@ Users need to add below to enable this feature per resource: Basic flows of Secret protection controller and ConfigMap protection controller are as follows: - Secret protection controller: - - The controller watches `Pod`, `PersistentVolume`, and `VolumeSnapshotContent` + - The controller watches `Pod`, `PersistentVolume`, `VolumeSnapshotContent`, `Deployment`, `StatefulSet`, `DaemonSet`, and `CronJob` - Create/Update event: - For all the `Secret`s referenced by the resource: - - If the `Secret` doesn't have `kubernetes.io/enable-secret-protection: "yes"` annotation, do nothing - - Update the `Secret`'s `Liens` field to add `kubernetes.io/secret-protection ${version}/${kind}/${namespace}/${name}`. + 1. Update the dependency graph (If the resource is using a Secret, mark the Secret that the resource is using it) + 2. If there are any updates to the Secret in the dependency graph, add the key for the Secret to add-lien-queue - Delete event: - For all the `Secret`s referenced by the resource: - - Update the `Secret`'s `Liens` field to remove `kubernetes.io/secret-protection ${version}/${kind}/${namespace}/${name}`. - - The controller periodically check `Pod`, `PersistentVolume`, and `VolumeSnapshotContent` + 1. Update the dependency graph (If the resource is using a Secret, unmark the Secret that the resource is using it) + 2. If there are no resource using the Secret in the dependency graph, add the key for the Secret to delete-lien-queue + - The controller periodically checks `Pod`, `PersistentVolume`, `VolumeSnapshotContent`, `Deployment`, `StatefulSet`, `DaemonSet`, and `CronJob` - For each resource: - - For all the `Secret`s referenced by the resource: - - If the `Secret` doesn't have `kubernetes.io/enable-secret-protection: "yes"` annotation, do nothing - - Update the `Secret`'s `Liens` field to add `kubernetes.io/secret-protection ${version}/${kind}/${namespace}/${name}`. - - The controller preiodically check `Secret` + 1. Update the dependency graph (If the resource is using a Secret, mark the Secret that the resource is using it) + 2. If there are any updates to the Secret in the dependency graph, add the key for the Secret to add-lien-queue + - The controller periodically checks `Secret` - For each `Secret`: - - If the `Secret` doesn't have `kubernetes.io/enable-secret-protection: "yes"` annotation - - Remove all the `Liens` that is prefixed with `kubernetes.io/secret-protection` - - otherwise: - - For all the liens that are prefixed with `kubernetes.io/secret-protection`: - - Check if the resource with `${version}/${kind}/${namespace}/${name}` exists, and if it no longer exists, delete the lien. + - Check if the `Secret` have `kubernetes.io/enable-secret-protection: "yes"` annotation: + - If yes: + 1. Check API server if the resources marked as using the secret in the dependency graph are still using the secret, and update the graph + 2. Check if there is no using resources in the graph any more + - If yes: Add the key for the secret to delete-lien-queue + - Otherwise: Add the key for the secret to add-lien-queue + - Otherwise: Add the key for the secret to delete-lien-queue + - The controller gets a key for a `Secret` from add-lien-queue: + 1. If the `Secret` doesn't have `kubernetes.io/enable-secret-protection: "yes"` annotation, do nothing + 2. Check API server if the `Secret` is actually used by one of the resources marked in the dependency graph, and if used, add `kubernetes.io/secret-protection` lien to the `Secret` + - The controller gets a key for a `Secret` from delete-lien-queue: + 1. If the `Secret` doesn't have `kubernetes.io/enable-secret-protection: "yes"` annotation, delete `kubernetes.io/secret-protection` lien from the `Secret` + 2. Check API server if the `Secret` is actually not used by any of the resources, and if not used, delete `kubernetes.io/secret-protection` lien from the `Secret` - ConfigMap protection controller: - - The controller watches `Pod` + - The controller watches `Pod`, `Deployment`, `StatefulSet`, `DaemonSet`, and `CronJob` - Create/Update event: - - For all the `ConfigMap`s referenced by the `Pod`: - - If the `ConfigMap` doesn't have `kubernetes.io/enable-configmap-protection: "yes"` annotation, do nothing - - Update the `ConfigMap`'s `Liens` field to add `kubernetes.io/configmap-protection ${version}/${kind}/${namespace}/${name}`. + - For all the `ConfigMap`s referenced by the resource: + 1. Update the dependency graph (If the resource is using a ConfigMap, mark the ConfigMap that the resource is using it) + 2. If there are any updates to the ConfigMap in the dependency graph, add the key for the ConfigMap to add-lien-queue - Delete event: - - For all the `ConfigMap`s referenced by the `Pod`: - - Update the `ConfigMap`'s `Liens` field to remove `kubernetes.io/configmap-protection ${version}/${kind}/${namespace}/${name}`. - - The controller periodically check `Pod` - - For each `Pod`: - - For all the `ConfigMap`s referenced by the `Pod`: - - If the `ConfigMap` doesn't have `kubernetes.io/enable-configmap-protection: "yes"` annotation, do nothing - - Update the `ConfigMap`'s `Liens` field to add `kubernetes.io/configmap-protection ${version}/${kind}/${namespace}/${name}`. - - The controller preiodically check the `ConfigMap`s + - For all the `ConfigMap`s referenced by the resource: + 1. Update the dependency graph (If the resource is using a ConfigMap, unmark the ConfigMap that the resource is using it) + 2. If there are no resource using the ConfigMap in the dependency graph, add the key for the ConfigMap to delete-lien-queue + - The controller periodically checks `Pod`, `Deployment`, `StatefulSet`, `DaemonSet`, and `CronJob` + - For each resource: + 1. Update the dependency graph (If the resource is using a ConfigMap, mark the ConfigMap that the resource is using it) + 2. If there are any updates to the ConfigMap in the dependency graph, add the key for the ConfigMap to add-lien-queue + - The controller periodically checks `ConfigMap` - For each `ConfigMap`: - - If the `ConfigMap` doesn't have `kubernetes.io/enable-configmap-protection: "yes"` annotation - - Remove all the `Liens` that is prefixed with `kubernetes.io/configmap-protection` - - otherwise: - - For all the liens that are prefixed with `kubernetes.io/configmap-protection`: - - Check if the resource with `${version}/${kind}/${namespace}/${name}` exists, and if it no longer exists, delete the lien. + - Check if the `ConfigMap` have `kubernetes.io/enable-configmap-protection: "yes"` annotation: + - If yes: + 1. Check API server if the resources marked as using the ConfigMap in the dependency graph are still using the ConfigMap, and update the graph + 2. Check if there is no using resources in the graph any more + - If yes: Add the key for the ConfigMap to delete-lien-queue + - Otherwise: Add the key for the ConfigMap to add-lien-queue + - Otherwise: Add the key for the ConfigMap to delete-lien-queue + - The controller gets a key for a `ConfigMap` from add-lien-queue: + 1. If the `ConfigMap` doesn't have `kubernetes.io/enable-configmap-protection: "yes"` annotation, do nothing + 2. Check API server if the `ConfigMap` is actually used by one of the resources marked in the dependency graph, and if used, add `kubernetes.io/configmap-protection` lien to the `ConfigMap` + - The controller gets a key for a `ConfigMap` from delete-lien-queue: + 1. If the `ConfigMap` doesn't have `kubernetes.io/enable-configmap-protection: "yes"` annotation, delete `kubernetes.io/configmap-protection` lien from the `ConfigMap` + 2. Check API server if the `ConfigMap` is actually not used by any of the resources, and if not used, delete `kubernetes.io/configmap-protection` lien from the `ConfigMap` Note that - Periodical checks for referencing resources are needed for handling a case that the annotation is added after referecing resources are created, @@ -334,6 +351,7 @@ So, when running these commands, we need to care that other controllers or users - Verify immediate deletion of a ConfigMap that is used by Pod - For Beta, scalability tests are added to exercise this feature. + - For GA, the introduced e2e tests will be promoted to conformance. ### Graduation Criteria From 7a3d79cb9a3972109fa3de3434b6c4e8c9004870 Mon Sep 17 00:00:00 2001 From: Masaki Kimura Date: Thu, 9 Sep 2021 21:10:59 +0000 Subject: [PATCH 12/12] KEP-2639: Addressed review comments --- .../README.md | 110 +++++++----------- 1 file changed, 42 insertions(+), 68 deletions(-) diff --git a/keps/sig-storage/2639-secret-configmap-protection/README.md b/keps/sig-storage/2639-secret-configmap-protection/README.md index b950cef1a13..f8d90060502 100644 --- a/keps/sig-storage/2639-secret-configmap-protection/README.md +++ b/keps/sig-storage/2639-secret-configmap-protection/README.md @@ -55,15 +55,16 @@ Items marked with (R) are required *prior to targeting to a milestone / release* ## Summary -This KEP proposes a feature to protect Secrets/ConfigMaps while it is in use. Currently, users can delete a secret that is being used by other resources, like Pods, PVs, and VolumeSnapshots. This may have negative impact on the resouces using the secret and it may result in data loss. +This KEP proposes a feature to protect Secrets/ConfigMaps while they are in use. Currently, users can delete a secret that is being used by other resources, like Pods, PVs, and VolumeSnapshots. This may have negative impact on the resources using the secret and it may result in data loss. -For example, one of the worst case scenarios is deletion of a Node Stage Secret for CSI volumes while it is still in-use. -If it happens, a CSI driver for the volume can't unstage the volume due to the lack of the Secret, -as a result, the volume can't be detached from the node and the volume can't even be deleted. +For example, one of the worst case scenarios is deletion of a controller-publish Secret or a provisioner Secret for CSI volumes while it is still in-use. +If it happens, a CSI driver for the volume can't controller-unpublish or delete the volume due to the lack of the Secret, +as a result, the volume remains published on the controller or the volume can't be deleted. This issue will easily happen if a PVC for the volume and a Secret for the volume exist in the same namespace, and a user requests to delete the namespace, which starts deletion of all the resources in the namespace. When the Secret is deleted before the PVC is deleted, this issue happens. +In addition, even if Secrets exist in a separate namespace, an admin or a user of the namespace may still mistakenly delete the Secrets. -Also, ConfigMaps can be deleted while it is in use, which may lead to an unexpected behavior in applications. +Also, ConfigMaps can be deleted while they are in use, which may lead to an unexpected behavior in applications. Similar features for protecting PV and PVC already exist as [pv-protection](https://github.com/kubernetes/enhancements/issues/499) and [pvc-protection](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/postpone-pvc-deletion-if-used-in-a-pod.md). @@ -79,12 +80,16 @@ Secrets can be used by below ways: ](https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes) (can be handled as CSI PV below) - From Deployment: - Specified through [DeploymentSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#deploymentspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) +- From Replicaset: + - Specified through [ReplicaSetSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#replicasetspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) (Excluded from the scope, because [Replicaset should be managed through Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/#when-to-use-a-replicaset)) - From StatefulSet: - Specified through [StatefulSetSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#statefulsetspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) - From DaemonSet: - Specified through [DaemonSetSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#daemonsetspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) - From CronJob: - Specified through [JobTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#cronjob-v1-batch) -> [JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#jobspec-v1-batch) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) +- From Job: + - Specified through [JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#jobspec-v1-batch) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) (Excluded from the scope, because existence of a Job doesn't always mean creation of Pod in the future. Also see [here](https://kubernetes.io/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically)) - From PV: - [CSI](https://kubernetes-csi.github.io/docs/secrets-and-credentials-storage-class.html): - provisioner secret @@ -98,21 +103,16 @@ Secrets can be used by below ways: - snapshotter secret ConfigMaps can be used by below ways: -- From Pod: - - [Mounted as files](https://kubernetes.io/docs/concepts/configuration/configmap/#using-configmaps-as-files-from-a-pod) - - [Exposed as environment variables](https://kubernetes.io/docs/concepts/configuration/configmap/#configmaps-and-pods) -- From Deployment: - - Specified through [DeploymentSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#deploymentspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) -- From StatefulSet: - - Specified through [StatefulSetSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#statefulsetspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) -- From DaemonSet: - - Specified through [DaemonSetSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#daemonsetspec-v1-apps) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) -- From CronJob: - - Specified through [JobTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#cronjob-v1-batch) -> [JobSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#jobspec-v1-batch) -> [PodTemplateSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#podtemplatespec-v1-core) +- From Pod, Deployment, Replicaset, StatefulSet, DaemonSet, CronJob, Job in the same way as Secrets + + Note that: + - "From PV" case doesn't exist for ConfigMap + - The same resources, or Replicaset and Job, are also out of scope for ConfigMap ### Goals - Protect Secrets/ConfigMaps from being deleted while they are in use. +- Provide ways to dynamically enable/disable above feature per Secret/Configmap ### Non-Goals @@ -192,26 +192,27 @@ Basic flows of Secret protection controller and ConfigMap protection controller - Secret protection controller: - The controller watches `Pod`, `PersistentVolume`, `VolumeSnapshotContent`, `Deployment`, `StatefulSet`, `DaemonSet`, and `CronJob` - - Create/Update event: + - Create/Update events: - For all the `Secret`s referenced by the resource: - 1. Update the dependency graph (If the resource is using a Secret, mark the Secret that the resource is using it) - 2. If there are any updates to the Secret in the dependency graph, add the key for the Secret to add-lien-queue + 1. Update the in-memory dependency graph (If the resource is using a Secret, add a dependency to the Secret that the resource is using the Secret. Also, the graph holds previous dependencies from the resource, so that the controller can remove a dependency to the Secret when the controller detects that the resource is no longer using the Secret). + 2. If number of dependency is changed: + - Number becomes 0: Add the key for the Secret to delete-lien-queue + - Number becomes 1+ from 0: Add the key for the Secret to add-lien-queue + - Otherwise: Do nothing - Delete event: - For all the `Secret`s referenced by the resource: - 1. Update the dependency graph (If the resource is using a Secret, unmark the Secret that the resource is using it) - 2. If there are no resource using the Secret in the dependency graph, add the key for the Secret to delete-lien-queue - - The controller periodically checks `Pod`, `PersistentVolume`, `VolumeSnapshotContent`, `Deployment`, `StatefulSet`, `DaemonSet`, and `CronJob` - - For each resource: - 1. Update the dependency graph (If the resource is using a Secret, mark the Secret that the resource is using it) - 2. If there are any updates to the Secret in the dependency graph, add the key for the Secret to add-lien-queue - - The controller periodically checks `Secret` - - For each `Secret`: + 1. Update the in-memory dependency graph (If the resource is using a Secret, delete a dependency to the Secret that the resource is using it) + 2. If number of dependency is changed: + - Number becomes 0: Add the key for the Secret to delete-lien-queue + - Otherwise: Do nothing + - The controller watches `Secret` + - Create/Update events: - Check if the `Secret` have `kubernetes.io/enable-secret-protection: "yes"` annotation: - If yes: - 1. Check API server if the resources marked as using the secret in the dependency graph are still using the secret, and update the graph + 1. Check API server if the resources marked as using the secret in the in-memory dependency graph are still using the secret, and update the graph 2. Check if there is no using resources in the graph any more - - If yes: Add the key for the secret to delete-lien-queue - - Otherwise: Add the key for the secret to add-lien-queue + - If yes: Add the key for the secret to delete-lien-queue + - Otherwise: Add the key for the secret to add-lien-queue - Otherwise: Add the key for the secret to delete-lien-queue - The controller gets a key for a `Secret` from add-lien-queue: 1. If the `Secret` doesn't have `kubernetes.io/enable-secret-protection: "yes"` annotation, do nothing @@ -220,40 +221,11 @@ Basic flows of Secret protection controller and ConfigMap protection controller 1. If the `Secret` doesn't have `kubernetes.io/enable-secret-protection: "yes"` annotation, delete `kubernetes.io/secret-protection` lien from the `Secret` 2. Check API server if the `Secret` is actually not used by any of the resources, and if not used, delete `kubernetes.io/secret-protection` lien from the `Secret` -- ConfigMap protection controller: - - The controller watches `Pod`, `Deployment`, `StatefulSet`, `DaemonSet`, and `CronJob` - - Create/Update event: - - For all the `ConfigMap`s referenced by the resource: - 1. Update the dependency graph (If the resource is using a ConfigMap, mark the ConfigMap that the resource is using it) - 2. If there are any updates to the ConfigMap in the dependency graph, add the key for the ConfigMap to add-lien-queue - - Delete event: - - For all the `ConfigMap`s referenced by the resource: - 1. Update the dependency graph (If the resource is using a ConfigMap, unmark the ConfigMap that the resource is using it) - 2. If there are no resource using the ConfigMap in the dependency graph, add the key for the ConfigMap to delete-lien-queue - - The controller periodically checks `Pod`, `Deployment`, `StatefulSet`, `DaemonSet`, and `CronJob` - - For each resource: - 1. Update the dependency graph (If the resource is using a ConfigMap, mark the ConfigMap that the resource is using it) - 2. If there are any updates to the ConfigMap in the dependency graph, add the key for the ConfigMap to add-lien-queue - - The controller periodically checks `ConfigMap` - - For each `ConfigMap`: - - Check if the `ConfigMap` have `kubernetes.io/enable-configmap-protection: "yes"` annotation: - - If yes: - 1. Check API server if the resources marked as using the ConfigMap in the dependency graph are still using the ConfigMap, and update the graph - 2. Check if there is no using resources in the graph any more - - If yes: Add the key for the ConfigMap to delete-lien-queue - - Otherwise: Add the key for the ConfigMap to add-lien-queue - - Otherwise: Add the key for the ConfigMap to delete-lien-queue - - The controller gets a key for a `ConfigMap` from add-lien-queue: - 1. If the `ConfigMap` doesn't have `kubernetes.io/enable-configmap-protection: "yes"` annotation, do nothing - 2. Check API server if the `ConfigMap` is actually used by one of the resources marked in the dependency graph, and if used, add `kubernetes.io/configmap-protection` lien to the `ConfigMap` - - The controller gets a key for a `ConfigMap` from delete-lien-queue: - 1. If the `ConfigMap` doesn't have `kubernetes.io/enable-configmap-protection: "yes"` annotation, delete `kubernetes.io/configmap-protection` lien from the `ConfigMap` - 2. Check API server if the `ConfigMap` is actually not used by any of the resources, and if not used, delete `kubernetes.io/configmap-protection` lien from the `ConfigMap` - -Note that - - Periodical checks for referencing resources are needed for handling a case that the annotation is added after referecing resources are created, - - Periodical checks for `Secret` or `Configmap` are needed for handling a case that update for a referencing resource is done in a way that remove the reference to `Secret` or `Configmap`, - - What deletion event handling is doing is covered by the periodical checks for `Secret` or `Configmap`, but added intentionally to handle the event in a timely manner. +- ConfigMap protection controller can be implemented in almost the same way, except: + - It handles ConfigMap instead of Secret + - PVs aren't watched + - `kubernetes.io/enable-configmap-protection: "yes"` annotation is checked + - `kubernetes.io/configmap-protection` lien is added Prototype implementation can be found [here](https://github.com/mkimuram/secret-protection/commits/lien). @@ -357,6 +329,7 @@ So, when running these commands, we need to care that other controllers or users ### Graduation Criteria #### Alpha -> Beta Graduation +- Review the dependencies covered by the controllers are enough - Gather feedback from developers and surveys of users - Tests are in Testgrid and linked in KEP @@ -502,10 +475,10 @@ There will be Secrets/ConfigMaps which have liens prefixed with `kubernetes.io/s ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? - [x] Metrics - - Metric name: secret_protection_controller + - Metric name: secret_protection - Aggregation method: prometheus - Components exposing the metric: secret-protection-controller - - Metric name: configmap_protection_controller + - Metric name: configmap_protection - Aggregation method: prometheus - Components exposing the metric: configmap-protection-controller @@ -554,10 +527,11 @@ and creating new ones, as well as about cluster-level services (e.g. DNS): ### Scalability ###### Will enabling / using this feature result in any new API calls? -- API call type: Update Secret/ConfigMap, List Pod/PV/VolumeSnapshot, and Get PVC/SC +- API call type: Update Secret/ConfigMap, Watch/List Secret/ConfigMap/Pod/PV/VolumeSnapshot/Deployment/StatefulSet/DaemonSet/CronJob, and Get PVC/SC + (List is called only before deleting lien to make sure that there are no resource using Secret/ConfigMap. For other cases, watch or list from cache is used). - estimated throughput: TBD - originating component: secret-protection-controller and configmap-protection-controller -- API calls are triggered by changes of Secret/ConfigMap, Pod, PV, VolumeSnapshot +- API calls are triggered by changes of Secret, ConfigMap, Pod, PV, VolumeSnapshot, Deployment, StatefulSet, DaemonSet, and CronJob ###### Will enabling / using this feature result in introducing new API types?