-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed config maps and replaced with rados omaps #312
Conversation
Testing done includes,
Tech Debt left for later (i.e optimizations and cleanup that does not hamper functionality and are not going to be a part of this PR):
|
@gman0 @Madhu-1 @dillaman PTAL, added one more commit to change documentation and sample manifests. Pending change now would be Helm charts (and catching upto master). There is an urgency to try and get this in, so that we can move toward a 1.0 release and also adopt the same into the Rook projects 1.0 release (in 2 weeks as of this writing). Request attention with this consideration in mind. Thanks! |
@ShyamsundarR Ack -- I'll start reviewing this tonight. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial quick pass comments since the more I dig, the more I am confused about the secrets...
Secrets/keys/IDs discussion So to begin with, some of the questions you raise are concerns to me as well, and (partly?) covered in this issue. I am summarizing various comments around the topic here to provide a current, and possibly discuss a (near) future, answer to the same. I would start referring to the 2 secrets (they are both kubernetes secrets, yes) as the secret and the config, where the secret is the one mentioned in the StorageClass [1] and config is the Ceph cluster config (ceph-cluster-) provided to the CSI plugin. Keeps the further discussion less ambiguous referring to the terms. I am also throwing in the term credential in the discussion to refer to an ID/key (as in Ceph user ID and key) in the discussion below.
No. The secret names and namespaces that are passed down from the StorageClass need to exist in a namespace, that is read by the node level services that invoke the CSI plugin RPCs with the data from these secrets. Hence, user namespaces need not have access to these secrets. Also, the config secret needs to be readable by the CSI plugin on the nodes, and hence needs appropriate access to the secret in whichever namespace they exist. In both cases, the user namespace requesting the volume provisioning or mounting service, do not need access to the secrets, nor are these present in their namespace.
If we want to support fine grained credentials per pool, I.e credentials that have access to do CRUD operations on a pool and not across pools in use by the CSI plugin, then we would want to use the credentials from the secrets in the StorageClass, else we can use the credentials in the config itself. For list operations, we will not get the credentials from the secrets and have to rely on the credentials from the config solely. If the config credentials can be crafted to have list permissions (IOW metadata read) then we can use the same for listing, else listing would have to remain unsupported.
Again, if we can segregate permissions for these credentials, with the user credential having only mount permissions (post mount reads and writes to the image should be possible) and the admin credential having CRUD permissions, it makes sense to have 2, else we should just fold these into a single credential. (part of #270)
The order of using credentials in various RPCs are as follows,
I think we need to continue this in #270 and look at the need for 2 credentials (user/admin) and then see if we want per-pool level credential segregation and if so retain the current scheme, else throw away the credentials from being passed in as secrets. [1] https://kubernetes-csi.github.io/docs/print.html#createdelete-volume-secret |
My concern is that they both provide credentials and that is confusing. I don't think there should be two different places to store the keys and up to three places to store the user ID. The fact that the code sometimes pulls the keys from the k8sconfig store via the
Ack -- cool, thx.
Sorry, I wasn't really asking about the difference between "admin" and "user" -- just the fact that the "admin"/"user" secrets are stored twice (and the admin/user IDs are also specified multiple times).
... and then fallback to the ID overrides from the StorageClass if not specified in the config secret?
Thanks for the pointer to that issue. I agree the difference between admin/user credentials is a separate issue. I am just worried about how confusing it is w/ all the different places to store a given set of credentials and how some CSI operations would pull credentials from the secret vs others that would pull it from the config. |
Discussed this further with @dillaman and here is the gist of the same:
|
@ShyamsundarR I was digging through the CSI spec and noticed the The |
I was assuming the image expansion is auto-detected on the rbd clients, and the FS may need growfs like operations. Thus, we really do not need credentials on the NodeExpandVolume. Am I missing something here?
|
Re: (1) I think I was just looking at the NodeExpandVolumeRequest and didn't see the ControllerExpandVolumeRequest. Disregard. Re: (2) I know there is a PR to get "metrics", but it looks like it's only being added to the CephFS driver for NodeGetVolumeStats. If there isn't any need to know the provisioned vs actual byte specs for RBD block PVs, I guess it's not needed. Re: (3) If it's not used by k8s, no worries. I am just trying to think ahead for worst-case. |
For block, I would assume we get metrics from the filesystem (or consumer) on the block device. Does it make sense to return used/free bytes from a block device?
This is useful, so thanks. I had done this exercise earlier (i.e looking at secrets and which RPCs etc.) but helps with another review. |
I do really like the idea of breaking the somewhat strange loop in the driver that calls back to k8s. Big thank you for working on a fix. :) I didn't see it called out up above, but one critical piece will be a migration path for those of us already using it. Is this planned? |
- stores keys named using the CO generated names for volume requests | ||
- keys are named "csi.volume."+[CO generated VolName] | ||
- Key value contains the RBD image uuid that is created or will be created, for the CO provided | ||
name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever I see applications making raw use of omaps I get nervous. The reason for this is mostly around the de facto limit of ~100,000 k/v pairs per object. Are we confident that a cluster won't have more than 100k volumes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough :) The initial numbers I discussed with @dillaman were 100K to be exact.
Anyway, as this information is new to me, and we added to omap to support listing (i.e bunch up the csi volumes under an omap to help with listing). My initial solution to the problem (as we can never answer what scale this may reach) would be to remove the csi.volumes.
and csi.snaps.
omap, but retain the volname and volid based omaps and keys, as we are no longer intending to support listing RPCs. Thoughts?
(the other knee jerk reaction was, let's create chapters of csi.volumes and use the same, but that depends on if we do not have to maintain counters around number of keys in an omap and can work with errors returned from a omap setomapval instead, that can help understand it is "full")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShyamsundarR with this approach how many PVC and snapshots we can create?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we can scale upto 100K, which is not essentially a hard ceiling. IOW, the updates to the objects may slow down, but not fail at that exact number.
This was discussed, and a possible path for the future is noted in this comment.
IOW, we decided to continue with the current scale, and in the future if required shard the objects, which is feasible under the current scheme.
- stores keys named using the CO generated names for snapshot requests | ||
- keys are named "csi.snap."+[CO generated SnapName] | ||
- Key value contains the RBD snapshot uuid that is created or will be created, for the CO | ||
provided name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern: k/v pair per snapshot won't scale I think.
examples/README.md
Outdated
* Key is typically the output of, `ceph auth get-key client.admin` where | ||
`admin` is the client Admin ID | ||
* This is used to substitute `[admin|user]id` and `[admin|user]key` values in | ||
the template file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing a particular reason why the ceph-csi needs the admin key. Is it so that it can do ceph osd lspools
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason was a simple carryover of what existed prior to this PR. The exact credentials and its permissions are being worked out in issue #270 when we also need to think about what rights are required for the various operations being performed (like ceph osd lspools
).
@dillaman Addressed the dual credential issue with the latest commit, PTAL. The issue of still having to specify 2 credentials (user and admin) and also the ID separate from the Key would be dealt with as a part of fixing #270 @gman0 @Madhu-1 Configuration requirements have changed from requiring a series of values in a secret to a config map of cluster IDs containing the monitors. |
Yes, migration is not planned as yet. This has cropped up from a 0.3 version to this instance as well in #296 Should we discuss the same in the other issue to arrive at a more generic solution? Also, at present as migration solutions I do not have any thoughts to offer. |
It would be great if a generic migration solution would fit here assuming the v0.3 driver used a different storage class. That same solution could then be used for migrating volumes between different storage tiers (i.e. bronze, silver, gold) or between non-mirrored and mirrored. |
7888eac
to
2b45d3d
Compare
Updated the Helm charts and squashed all commits into one. Among the open issues, is only deciding about the omap scale as per this comment. I am hoping we can get this in this week, and start work on credential changes to RBD and also working on the CephFS plugin to incorporate similar changes. Requesting reviews, @dillaman @Madhu-1 @gman0 (also, @batrick from the omap usage perspective (among others)). |
Kind of concerned.... what is the migration path here? There are existing users of the chart. |
On further discussions regarding the omap scale issue, the resolution would be to apply sharding (as suggested by @liewegas) to the volumes and snaps omaps when required. The ability to shard can be based on selecting a shard using the Also, when sharding is introduced, the existing omaps may need a one-time "split into shards" operation to be performed, but this will not impact the metadata exchanged with the CO environment. Added sharding to the TechDebt list as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm -- a few nits and a question about how one updates the "ceph-csi-config"
2b45d3d
to
b72caa6
Compare
@dillaman I am adding the DNM flag to this. As per my understanding the following actions need to be closed before we can merge this, (reviews and acks still welcome) |
@ShyamsundarR Thanks! |
We found an issue in the helm chart where it is writing configmaps to the wrong namespace.(#342) We could fix it but then would require steps to move the configmaps from the old location to the new location. But if we have a migration plan for getting configmaps to omaps, soon, we could skip fixing the configmap issue and recommend just switching to omaps. How soon do we thing this pr is:
|
Requesting reviews @humblec @Madhu-1 @gman0 @phlogistonjohn any help is appreciated. |
if req.GetVolumeId() == "" { | ||
return nil, status.Error(codes.InvalidArgument, "Empty volume ID in request") | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Empty/s in above if's..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All existing status.Error
routines seem to capitalize the first letter of the error message. Is this not the convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: Retaining one comment on the capitalization of first letter of messages in status.Error for ease of discussion.
As pointed out by @Madhu-1 error strings need to start with a lower case letter in golang. This is also enforced by the linter.
These errors seem to be RPC error messages, should they or do they follow the same convention is something to consider before fixing it.
I assume we open a new issue against the code, to fix this across rather than address part of them here and rest later. IOW, I intend to leave added status.Errors in the current form than correcting them to adhere to the first letter being lower case.
@humblec or @Madhu-1 if you feel strongly that this should be corrected in this PR, for messages added with this PR, let me know.
if err != nil { | ||
return err | ||
klog.Errorf("failed getting information for image (%s): (%s)", poolName+"/"+imageName, err) | ||
if strings.Contains(string(stdout), "rbd: error opening image "+imageName+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need this check here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is to detect missing images, and segregate them from other classes of errors. We need this to return success for successive DeleteVolume requests in case the first one times out, but the work is actually done. If we cannot detect image missing errors it would appear to the CO system as internal errors or other errors.
The above is one case of the same, which is needed in the code. The error itself can be used generically to detect that such an image does not exist.
@ShyamsundarR Started the review, there is also conflict. Can you correct the conflict and update the PR. Btw, I see there are good number of changes which are just indentation correction. Can you remove those changes from this PR for better review look? |
@humblec I would be updating the PR addressing any review comments and squashing the commits into one, at which point I intend to address the conflicts as well and also would rebase to tip of master. Is this fine? Or, do you want me to rebase right away? I went through the PR attempting to find the indentation corrections only, and found about 18 lines that are such changes (out of ~2200 additions in all). Some of these 18 are in YAML where there were trailing white spaces, some for lines that were too long, and a few to introduce a line break at some logical point in the code. Further, I used the diff settings in github to ignore whitespaces and counted line deltas, other than 5 files no other files contain whitespace only changes when comparing the changed lines without the said diff settings. Some structures and assignment blocks are auto indented when adding or removing a variable whose length is different, these white space changes are out of my control (unless I use a plain text editor to correct them back to their original settings). I do understand the problem white space only changes can cause in reviews, but the quantum in this PR seems quite low (~18 as I saw it). Do you really want me to remove these in the PR? As this calls for very careful looking and removing just these changes. If they are hindering review, could you change the diff settings for this PR to ignore white spaces and take a look? |
6b8cffb
to
bd9492c
Compare
`monitors` | one of `monitors`, `clusterID` or `monValueFromSecret` must be set | Comma separated list of Ceph monitors (e.g. `192.168.100.1:6789,192.168.100.2:6789,192.168.100.3:6789`) | ||
`monValueFromSecret` | one of `monitors`, `clusterID` or and `monValueFromSecret` must be set | a string pointing the key in the credential secret, whose value is the mon. This is used for the case when the monitors' IP or hostnames are changed, the secret can be updated to pick up the new monitors. | ||
`clusterID` | one of `monitors`, `clusterID` or `monValueFromSecret` must be set | String representing a Ceph cluster, must be unique across all Ceph clusters in use for provisioning, cannot be greater than 36 bytes in length, and should remain immutable for the lifetime of the Ceph cluster in use | ||
`clusterID` | yes | String representing a Ceph cluster, must be unique across all Ceph clusters in use for provisioning, cannot be greater than 36 bytes in length, and should remain immutable for the lifetime of the Ceph cluster in use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/String/string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all first letters of the description column in the deploy-rbd.md doc is capitalized, as seen here, are you suggesting the first "String" word be in lower case as in "string"
specified in `examples/rbd/template-ceph-cluster-ID-secret.yaml` needs to be | ||
created, with the secret name matching the string value provided as the | ||
`clusterID`. | ||
|
||
## Deployment with Kubernetes | ||
|
||
Requires Kubernetes 1.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be 'Requires Kubernetes >= v1.11 ?'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need to be kube >=1.13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not updated as a part of this patch, I am testing against v1.13 and also do not have any specific Kube version dependent changes to call out a different version.
Do you want me to change this as a part of this PR, or should it be a separate edit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, This can a separate patch or it can be updated as part of this patch
@@ -87,10 +52,17 @@ func (cs *ControllerServer) validateVolumeReq(req *csi.CreateVolumeRequest) erro | |||
if req.VolumeCapabilities == nil { | |||
return status.Error(codes.InvalidArgument, "Volume Capabilities cannot be empty") | |||
} | |||
options := req.GetParameters() | |||
if value, ok := options["clusterID"]; !ok || len(value) == 0 { | |||
return status.Error(codes.InvalidArgument, "Missing or empty cluster ID to provision volume from") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Missing/missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we are discussing this here, IOW, do you want me to change the strings that I added in this patch to capitalized first letter OR are we dealing with this in a separate patch as the code base needs a revision on the same? Please let me know so that I can handle it appropriately.
err error | ||
} | ||
|
||
func (e ErrImageNotFound) Error() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. this does not look like a common pattern for describing errors and method on top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point to a common pattern to adopt? I do not know of any hence asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShyamsundarR can we move this file outside of rbd, s that we can use same for cephfs also.
is this possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some errors move out to util when they are shared by CephFS (in the [refactor]ShyamsundarR@c987e8b) patch) some stay (e.g ErrImageNotFound) as they are RBD specific. So in this PR I cannot move it to a common file.
BTW, there is another common file in util with other errors already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1 pointed out to this source for error definition https://golang.org/src/errors/errors.go and as per this, the way errors are defined looks right. @humblec can you clarify if this looks right?
Existing config maps are now replaced with rados omaps that help store information regarding the requested volume names and the rbd image names backing the same. Further to detect cluster, pool and which image a volume ID refers to, changes to volume ID encoding has been done as per provided design specification in the stateless ceph-csi proposal. Additional changes and updates, - Updated documentation - Updated manifests - Updated Helm chart - Addressed a few csi-test failures Signed-off-by: ShyamsundarR <srangana@redhat.com>
bd9492c
to
4f787f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its s long pending PR we can take this \ in, if any minor comments are there can be addressed in next PR
@humblec PTAL. |
Is there a migration path defined yet? |
I want to address this in a few ways, hence opened an issue #378 that talks about the same with the label that it is part of the release 1.1. |
@ShyamsundarR To summarize my thoughts, we need to do some more code improvement here and I would like to sort out the locking stuff in another PR. I am making a design for the same. However these can be follow up PRs and lets unblock this for now. So, I am fine with addressing rest in subsequent work. In short, approved. |
Syncing latest changes from upstream devel for ceph-csi
Existing config maps are now replaced with rados omaps that help
store information regarding the requested volume names and the rbd
image names backing the same.
Further to detect cluster, pool and which image a volume ID refers
to, changes to volume ID encoding has been done as per provided
design specification in the stateless ceph-csi proposal.