doc: add proposal for fetching ceph credentials from KMS #3394

saiprashanth173 · 2022-09-14T22:35:22Z

Add proposal to support fetching Ceph credentials from KMS

Signed-off-by: Prashanth Dintyala vdintyala@nvidia.com

Describe what this PR does

Ceph-CSI currently relies on K8s secrets for ceph user (provisioner, node-stage, expand-secret) credentials. Given that ceph-csi already supports managing volume encryption keys through vault and other KMS, it should be possible to extend the existing implementation to fetch Ceph user credentials as well.

This PR has a high level design proposal for this feature.

Related slack discussion with @Madhu-1 @nixpanic: https://cephcsi.slack.com/archives/CJV6G2DFT/p1662485493014329

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

/retest ci/centos/<job-name>: retest the <job-name> after unrelated
failure (please report the failure too!)
/retest all: run this in case the CentOS CI failed to start/report any test
progress or results

Add proposal to support fetching Ceph credentials from KMS Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>

humblec · 2022-09-15T04:22:25Z

@saiprashanth173 thanks for the proposal!

Looking into this, I am really confused and here is why.

Ceph-CSI relies on K8s CSI components for fetching Ceph user credentials.
K8s CSI uses the following reserved StorageClass parameters for fetching
credentials and passes them to Ceph-CSI.
csi.storage.k8s.io/controller-expand-secret-name
csi.storage.k8s.io/controller-expand-secret-namespace
csi.storage.k8s.io/provisioner-secret-name
csi.storage.k8s.io/provisioner-secret-namespace
csi.storage.k8s.io/node-stage-secret-name
csi.storage.k8s.io/node-stage-secret-namespace

These secrets and its management has nothing to do with the CSI driver itself. This is the mechanism co-ordinated from CO ( ex: Kube) based on the CSI spec . We ( as in Ceph CSI drivers) just consume it in the path! It is not something like "go and fetch" from "secrets" from CSI driver pov. These are provided as part of the RPC call to the driver like any other SC-> CSI driver combo.
These entries in the SC is common for all CSI drivers and nothing specific to Ceph CSI driver thus Ceph CSI driver has no role here.

StorageClass parameters cannot be used for credentials as
they are not passed in all CSI requests e.g DeleteVolumeRequest.

This is well intended and its not supposed to be used and it works.

Regardless, keeping the Ceph user information in KMS/Kube object is something we have to get the discussions on and we dont have to clobber it with above secrets..etc. In that light, adding more details on "use case" , "advantages compared to existing model" , "security aspects" (if any), "backward compatibliity" , "integration to rook as it comes in the workflow" ...etc would be value added to the proposal.

Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>

saiprashanth173 · 2022-09-20T06:20:20Z

@humblec thanks for your feedback.

Sorry it took me some time to get back, I was out of the office last week.

These secrets and its management has nothing to do with the CSI driver itself. This is the mechanism co-ordinated from CO ( ex: Kube) based on the CSI spec . We ( as in Ceph CSI drivers) just consume it in the path! It is not something like "go and fetch" from "secrets" from CSI driver pov. These are provided as part of the RPC call to the driver like any other SC-> CSI driver combo.
These entries in the SC is common for all CSI drivers and nothing specific to Ceph CSI driver thus Ceph CSI driver has no role here.

I initially added the background section to give some context on how the current flow works. My intention was not to comment on the mechanism, it was just to summarize it. I removed the background section for now as it appears to add confusion and isn't solving the intended purpose.

This is well intended and its not supposed to be used and it works.

Just to clarify, I was trying to set context for why kmsID in case of credentials should be a part expand-secret, provisioner-secret or node-stage-secret instead of specifying it as a StorageClass parameter, like we do in the case of volume encryption. If you feel this isn't very clear, I can restructure this section a bit. Please let me know.

Regardless, keeping the Ceph user information in KMS/Kube object is something we have to get the discussions on and we dont have to clobber it with above secrets..etc. In that light, adding more details on "use case" , "advantages compared to existing model" , "security aspects" (if any), "backward compatibliity" , "integration to rook as it comes in the workflow" ...etc would be value added to the proposal.

I tried to add some more details to the proposal. I will try to refine the proposal further. Please let me know if you have any other suggestions on the latest version. Happy to make further changes based on your feedback :)

Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>

saiprashanth173 · 2022-10-05T22:17:38Z

@Madhu-1 @humblec @nixpanic do you have any suggestion/comments on the proposal? Please let me know.

Madhu-1 · 2022-10-12T07:28:06Z

@saiprashanth173 is this still WIP design doc?

Madhu-1

Overall looks good but as mention this is kind of overhead and have performance impact if someone chooses security over performance its good for them.

IMHO again it wont solve this issue, if the kubernetes secrets are compromised anyone can gain access to KMS(as all the KMS details are stored in kubernetes configmap and secret) and get the ceph cluster access.

Madhu-1 · 2022-10-12T07:35:29Z

docs/design/proposals/creds-from-kms.md

+fetching DEKs (Data Encryption Keys) used for RBD volume encryption.
+The focus of this proposal is to leverage existing KMS implementation
+to support obtaining Ceph user keyring used while
+creating, deleting, mapping and resizing RBD volumes.


we dont need to stick with RBD volumes here as this can be used with cephfs also?

Madhu-1 · 2022-10-12T07:36:24Z

docs/design/proposals/creds-from-kms.md

+
+- For scenarios where Ceph is deployed externally, either as
+  a standalone or through Rook, one needs to obtain the Ceph keyring(s)
+  and manually create the K8s secrets needed by Ceph-CSI. With this feature,


This is true only if CephCSI is depoyed externally, when Rook deploys the ceph cluster it creates the required secrets for cephcsi.

I agree that this feature would directly fit into external setup scenario. However, I would see this as a valid case even when Rook deploys Ceph cluster. I see that Rook creates the following secrets by default, and the user caps are quite broad:

rook-csi-cephfs-node rook-csi-cephfs-provisioner rook-csi-rbd-node rook-csi-rbd-provisioner

However, if we were to support more advanced use case such as soft multi-tenancy. It might make sense to have a ceph users per tenant and provide permissions to only access specific pool(s).

Also, if Rook plans to add support for rook/rook#6374. It would be a valid case even in normal scenario.

However, if we were to support more advanced use case such as soft multi-tenancy. It might make sense to have a ceph users per tenant and provide permissions to only access specific pool(s).

Admin can still create new users with different caps using cephclient CRD and use that with custom storageclasses.

This is true only if CephCSI is depoyed externally, when Rook deploys the ceph cluster it creates the required secrets for cephcsi.

Sorry, I might have misunderstood your comment.

In this point, I was only referring CephCSI being deployed externally:

For scenarios where Ceph is deployed externally, either as
a standalone or through Rook,

For scenarios where Ceph is deployed externally, by this, I mean Ceph cluster is not in the same K8s cluster as Ceph-CSI. either as a standalone or through Rook by this I mean the external Ceph cluster may have been deployed using Rook or some other tool.

If you feel the sentence isn't clear enough, I can change that a bit. Please let me know.

Admin can still create new users with different caps using cephclient CRD and use that with custom storageclasses.

Thanks for letting me know. You are right, for Rook based Ceph and CephCSI deployments this approach might not be applicable, unless Rook supports storing keyrings directly in a KMS.

Madhu-1 · 2022-10-12T07:38:03Z

docs/design/proposals/creds-from-kms.md

+
+- Adds additional overhead as on each CSI RPC call
+  a request is made to KMS for fetching credentials.
+- Possible risk of hitting the KMS rate limit


Wont this be a major drawback?

I feel KMS rate limit is a valid concern for anything, including RBD volume encryption, using the KMS implementation. One possible approach to mitigate this would be to use an in-memory cache of Ceph keyrings, given that keyrings don't change too often we only fetch them when the keyring doesn't exist already or is invalid (i.e. on failed ceph auth).

👍 the in-memory cache need to be deleted for few cases like

auth issues

connection issues

I added a point on in-memory cache under Code Changes sections.

Do you mind elaborating a bit on what you mean by connection issues in this context?

docs/design/proposals/creds-from-kms.md

Madhu-1 · 2022-10-12T07:44:14Z

docs/design/proposals/creds-from-kms.md

+  kmsID: <kms-id>
+ ```
+
+- A new ConfigMap with default name `ceph-csi-creds-kms-config` similar to


why we need extra kms configmap cant we use existing kms configuration?

Technically, we don't need a new config here. Given that structure of encryption and creds KMS config entries slight differ, specially due to the newly added tenantNamespace key and missing tenant sub config in case of creds, I thought it might make sense to keep them separate.

I can change this section to use the same configuration if that makes sense. Please let me know what you think.

I updated this section for reusing the existing KMS configuration.

Madhu-1 · 2022-10-12T07:45:00Z

docs/design/proposals/creds-from-kms.md

+    name: ceph-csi-creds-kms-config
+```
+
+- For KMS that rely on tenants' namespace to obtain required


this is not something specific to this feature right?

Sorry, I don't follow the question completely.

The need to add "tenantNamespace" in config is specific to this feature, as we don't have the tenant namespace available in all RPC calls. RBD volume encryption uses the tenants' namespace for all its operations.

add the new field to the existing struct and reuse it, lets avoid too many configurations

Makes sense. I updated the section.

Madhu-1 · 2022-10-12T07:46:26Z

docs/design/proposals/creds-from-kms.md

+both `kmsID` and `userKey`
+the keyring provided secrets will be used for creating the credential object.
+
+### Integration with Rook


@travisn can you please check this one?

Rakshith-R · 2022-10-12T14:30:23Z

Overall looks good but as mention this is kind of overhead and have performance impact if someone chooses security over performance its good for them.

IMHO again it wont solve this issue, if the kubernetes secrets are compromised anyone can gain access to KMS(as all the KMS details are stored in kubernetes configmap and secret) and get the ceph cluster access.

+1

Madhu-1 · 2022-10-13T07:11:40Z

Overall looks good but as mention this is kind of overhead and have performance impact if someone chooses security over performance its good for them.
IMHO again it wont solve this issue, if the kubernetes secrets are compromised anyone can gain access to KMS(as all the KMS details are stored in kubernetes configmap and secret) and get the ceph cluster access.

+1

@saiprashanth173 can you please comment on this one, this helps to take this forward.

nixpanic · 2022-10-13T16:34:38Z

Overall looks good but as mention this is kind of overhead and have performance impact if someone chooses security over performance its good for them.
IMHO again it wont solve this issue, if the kubernetes secrets are compromised anyone can gain access to KMS(as all the KMS details are stored in kubernetes configmap and secret) and get the ceph cluster access.

+1

@saiprashanth173 can you please comment on this one, this helps to take this forward.

This is also what I wonder mostly about. The advantage of placing the Ceph credentials in a KMS is not clear to me. Kubernetes still stores credentials to access the KMS, so getting to the Ceph credentials is still possible without much additional effort?

saiprashanth173 · 2022-10-13T20:33:26Z

IMHO again it wont solve this issue, if the kubernetes secrets are compromised anyone can gain access to KMS(as all the KMS details are stored in kubernetes configmap and secret) and get the ceph cluster access.

Sorry, I missed commenting on this.

I mostly agree that this solution wouldn't solve the issue completely for the KMS integrations that require storing auth tokens in K8s secrets. If we take vault service account based approach, it would make it more difficult (if not impossible) to access credentials compared to K8s secrets, for following reasons:

In K8s 1.24+, service account token secrets are not auto-generated and starting Ceph-CSI 3.7 we always attempt to create a service account token and use it with vault. These tokens are short-lived, and creating them would require additional privileges, e.g. RBAC with serviceaccounts/token|create . Related: rbd: create token and use it for vault SA #3174 rbd: create token and use it for vault SA everytime possible #3377
Even with cases where a service account token is compromised, the impact will be restricted only to the vault path which the vault role linked to the service account has access to.

I believe storing secrets in KMS can provide additional level of security. Even with compromised access tokens, most KMS support additional security tightening measures which imposes restrictions on who can access the KMS endpoints, making it slightly more difficult to access sensitive secrets.

Additionally, I see this feature solving another issue, specifically for external Ceph setups, where the process of sharing and rotating keyrings can be made more seamless and secure.

Please let me know what you think. Happy to discuss this further.

Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>

nixpanic

I agree that this proposal could make it easier to deny access (or rotate tokens) for certain users/accounts if credentials are required to access a KMS and fetch the Ceph credentials from there. Careful auditing of the access logs in the KMS might prevent requiring rotating the Ceph credentials after someone obtained the short-living KMS credentials.

It would also become possible (not sure about the smart or recommended practice) to share Ceph credentials from a single location (KMS) over multiple Kubernetes clusters.

With that, I'm supportive of the feature and some users might be interested in it.

nixpanic · 2022-10-19T15:24:41Z

docs/design/proposals/creds-from-kms.md

+  Having this information as part of *StorageClass* parameters,
+  like using the `encryptionKMSID` key for volume encryption,
+  wouldn't work as these parameters are not passed
+  to all CSI RPCs, e.g. `DeleteVolume`, `ControllerExpand` etc.


The parameters in the StorageClass are copied into the PersistentVolume.CSIDriver parameters. I think it would be cleaner to not use the common secrets in the StorageClass for this, but directly refer to a kmsID with potentially additional parameters like userID.

It is not very clear to me on how we can obtain the PersistentVolume.CSIDriver parameters from calls like DeleteVolume.

Looking at the DeleteVolume method and DeleteVolumeRequest struct:

func (cs *ControllerServer) DeleteVolume( ctx context.Context, req *csi.DeleteVolumeRequest, ) (*csi.DeleteVolumeResponse, error) // .... type DeleteVolumeRequest struct { // The ID of the volume to be deprovisioned. // This field is REQUIRED. VolumeId string `protobuf:"bytes,1,opt,name=volume_id,json=volumeId,proto3" json:"volume_id,omitempty"` // Secrets required by plugin to complete volume deletion request. // This field is OPTIONAL. Refer to the `Secrets Requirements` // section on how to use this field. Secrets map[string]string `protobuf:"bytes,2,rep,name=secrets,proto3" json:"secrets,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"bytes,2,opt,name=value,proto3"` XXX_NoUnkeyedLiteral struct{} `json:"-"` XXX_unrecognized []byte `json:"-"` XXX_sizecache int32 `json:"-"` }

I only see that we can get VolumeId as part of this request. Are you suggesting we get the PersistentVolume object here to obtain these parameters? If yes, wouldn't that require an additional API call?

Do you mind helping me understand this better?

nixpanic · 2022-10-19T15:30:44Z

docs/design/proposals/creds-from-kms.md

+  As the tenant namespace isn't available in all CSI requests,
+  value from provided in this field will be used instead.
+  If "tenantNamespace" field is absent, Ceph-CSI namespace
+  will be used as default.


I think this is not very practical in the end. Why have a default user for some operations, and use the tenant for others? Using different Ceph credentials for different operations on a single volume looks rather confusing. Usually you would also try to prevent normal users from accessing the Ceph cluster, so those users should not need to have access to Ceph credentials.

@nixpanic to clarify my point:

the tenantNamespace here is only used for getting everything required by the KMS to fetch the Ceph user credentials. One such example is obtaining the KMS access token, which could be creating a service account token for vault SA or fetching tokens from K8s secret for some other KMS.

Any volume associated with a storage class would always use same Ceph credentials for all its operations. I see Ceph credentials as pair of userID and userKey, where userID is obtained from csi.storage.k8s.io/*-secret-name and userKey is fetched from the KMS. The user will never change until we change userID and KMS config.

The KMS will only use one namespace for all the operations. If the tenantNamespace is present in the KMS config, it will always use that. If not, it always looks in Ceph-CSI namespace as default. Again, this is only for KMS authentication so that we can fetch the required Ceph credentials.

github-actions · 2022-12-16T21:01:34Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

github-actions · 2022-12-31T21:01:20Z

This pull request has been automatically closed due to inactivity. Please re-open if these changes are still required.

doc: add proposal for fetching ceph credentials from KMS

4d992d6

Add proposal to support fetching Ceph credentials from KMS Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>

mergify bot added ci/skip/e2e skip running e2e CI jobs ci/skip/multi-arch-build skip building on multiple architectures component/docs Issues and PRs related to documentation labels Sep 14, 2022

saiprashanth173 added 2 commits September 19, 2022 22:56

doc: add additional sections

394c189

Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>

doc: remove background section

4619e0d

Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>

doc: pass 1 - refining proposal

20a9d61

Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>

Madhu-1 reviewed Oct 12, 2022

View reviewed changes

doc: add review suggestions

2486df6

Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>

saiprashanth173 force-pushed the cred-kms-proposal branch from 69c5ccc to 2486df6 Compare October 13, 2022 22:19

doc: add details on credentials caching

426080f

Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>

saiprashanth173 changed the title ~~[WIP] doc: add proposal for fetching ceph credentials from KMS~~ doc: add proposal for fetching ceph credentials from KMS Oct 14, 2022

nixpanic reviewed Oct 19, 2022

View reviewed changes

nixpanic requested a review from a team October 19, 2022 15:35

Madhu-1 requested a review from nixpanic November 8, 2022 10:33

nixpanic approved these changes Nov 16, 2022

View reviewed changes

github-actions bot added the stale label Dec 16, 2022

github-actions bot closed this Dec 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: add proposal for fetching ceph credentials from KMS #3394

doc: add proposal for fetching ceph credentials from KMS #3394

saiprashanth173 commented Sep 14, 2022

humblec commented Sep 15, 2022 •

edited

Loading

saiprashanth173 commented Sep 20, 2022

saiprashanth173 commented Oct 5, 2022

Madhu-1 commented Oct 12, 2022

Madhu-1 left a comment

Madhu-1 Oct 12, 2022

Madhu-1 Oct 12, 2022

saiprashanth173 Oct 13, 2022

Madhu-1 Oct 13, 2022

saiprashanth173 Oct 13, 2022

Madhu-1 Oct 12, 2022

saiprashanth173 Oct 13, 2022

Madhu-1 Oct 13, 2022

saiprashanth173 Oct 13, 2022

Madhu-1 Oct 12, 2022

saiprashanth173 Oct 13, 2022 •

edited

Loading

Madhu-1 Oct 13, 2022

saiprashanth173 Oct 13, 2022

Madhu-1 Oct 12, 2022

saiprashanth173 Oct 13, 2022

Madhu-1 Oct 13, 2022

saiprashanth173 Oct 13, 2022

Madhu-1 Oct 12, 2022

Rakshith-R commented Oct 12, 2022

Madhu-1 commented Oct 13, 2022

nixpanic commented Oct 13, 2022

saiprashanth173 commented Oct 13, 2022

nixpanic left a comment

nixpanic Oct 19, 2022

saiprashanth173 Oct 19, 2022 •

edited

Loading

nixpanic Oct 19, 2022

saiprashanth173 Oct 19, 2022

github-actions bot commented Dec 16, 2022

github-actions bot commented Dec 31, 2022

doc: add proposal for fetching ceph credentials from KMS #3394

doc: add proposal for fetching ceph credentials from KMS #3394

Conversation

saiprashanth173 commented Sep 14, 2022

Describe what this PR does

humblec commented Sep 15, 2022 • edited Loading

saiprashanth173 commented Sep 20, 2022

saiprashanth173 commented Oct 5, 2022

Madhu-1 commented Oct 12, 2022

Madhu-1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saiprashanth173 Oct 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rakshith-R commented Oct 12, 2022

Madhu-1 commented Oct 13, 2022

nixpanic commented Oct 13, 2022

saiprashanth173 commented Oct 13, 2022

nixpanic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saiprashanth173 Oct 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 16, 2022

github-actions bot commented Dec 31, 2022

humblec commented Sep 15, 2022 •

edited

Loading

saiprashanth173 Oct 13, 2022 •

edited

Loading

saiprashanth173 Oct 19, 2022 •

edited

Loading