-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement metrics for cephfs CSI driver. #310
Conversation
26316b6
to
5a12f32
Compare
@humblec - any updates? |
@humblec discussions around dropping credential passing to the plugins, and using what is passed down form the StorageClass make it so that, the NodeGetVolumeStats RPC will have to function without credentials. Noting it here as you may want to factor that into your implementation. |
This PR is waiting for kubernetes support . I will update the PR once the support is added in upstream |
Dependency merged to kubernetes kubernetes/kubernetes#76188 |
Yeah, the dependency patch in kubernetes got merged on May 17th which was before |
Couple of days back the kubernetes dependency of our CSI repo has been upgraded to 1.15 which support metrics code base. This PR also depends on new CephFS volume provisioning |
The final dependency has also got merged last week and part of 1.1.0 release https://github.com/ceph/ceph-csi/releases . Revisiting this PR as all the dependencies are in. |
8ffa64e
to
2ab5c28
Compare
d7a39f5
to
e4e7600
Compare
is this for rbd also? if not please change the title and also add an E2E for this feature |
Its for CephFS , updated the title. |
pkg/cephfs/cephfs_statfs.go
Outdated
|
||
// getFsInfo writes metrics.Capacity, metrics.Used and metrics.Available from the filesystem info | ||
func (md *cephfsStatFS) getFsInfo(metrics *volume.Metrics) error { | ||
available, capacity, usage, inodes, inodesFree, inodesUsed, err := fs.FsInfo(md.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@humblec (tagging @ajarr as well) The call to fs.FsInfo uses the syscall statfs. I checked that statfs on a CephFS subvolume mount, returns used space as per the set attrs on the subvolume. I wanted to know if the same applies to inodes, IOW, statfs fields,
fsfilcnt_t f_files; /* total file nodes in file system */
fsfilcnt_t f_ffree; /* free file nodes in fs */
and hence can be used to generate metrics as below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShyamsundarR not sure I got the comment, I would like to mention that, whatever statfs
returns is good enough for us. If subvol is not responding properly to statfs
its totally a different bug which Ceph has to take care. I dont think there is a bug though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@humblec I tested and determined that cephfs subvolume mounts as used in ceph-csi returns the correct total space as needed by us (i.e it sticks to the quota set on the subvolume). I am however not able to determine if it returns inode count information as required (or used space, which is a short test away if needed). Hence I need confirmation that it does, in case you know it or can test it, else from CephFS developers.
We need to understand statfs limitations if any for CephFS subvolumes, before interpreting it's output the way we want OR assuming any errors is a bug in CephFS subvolume implementation.
The provided image shows used space, which was never my concern as stated in the comment.
Hope this provides the required clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to understand statfs limitations if any for CephFS subvolumes, before interpreting it's output the way we want OR assuming any errors is a bug in CephFS subvolume implementation.
@ShyamsundarR If you mount the subvolume and call stat against, you should be able to verify it , Isnt it ? Agree that, its good to verify with cephfs team too.
Btw @anmolsachan was testing this PR and we got the expected result in his testing. |
yes, thanks Humble for the PR. Attaching screenshot |
pkg/cephfs/cephfs_statfs.go
Outdated
"k8s.io/kubernetes/pkg/volume/util/fs" | ||
) | ||
|
||
const ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@humblec I think this entire code is already there in https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/metrics_statfs.go can we just make use of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1 Its a mix or more than that with errors, It can also have ceph specific errors and can be enhanced for more functions. Thats why it has been modified and kept it separately. copyright carries for both for the same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way, this has been shrinked now.
pkg/cephfs/nodeserver.go
Outdated
|
||
if err != nil { | ||
klog.Errorf("stat failed on path: %v, error: %v", targetPath, err) | ||
return nil, status.Error(codes.Internal, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return invalid agrument if targetPath does not exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still this need to be handled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@humblec please resolve the merge conflict |
@Madhu-1 Done. Thanks! |
@humblec are we adding an E2E test for this? Or is this not feasible? |
No @ShyamsundarR. Its not feasible. |
What are the alternatives for the future? csi-test is one that I can see. Also feasibility is an issue because there are no callers to this? |
@ShyamsundarR we can add csi-test later for the RPC call. |
Yes I am also looking at the "future" and not at in this PR, but wanted clarity on the feasibility issue, so that we know how to tackle it later. |
@ShyamsundarR Yeah, we could start with csi-test . Meanwhile I will think about the E2E possibilities, if any. |
|
||
isMnt, err := isMountPoint(targetPath) | ||
|
||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
if err != nil{
klog.Errorf("stat failed on path: %v, error: %v", targetPath, err)
if os.IsNotExist(err){
return nil, status.Error(codes.InvalidArgument, err.Error())
}
else{
return nil, status.Error(codes.Internal, err.Error())
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isMountPoint is always returning status.Error(codes.Internal, err.Error())
on errors, but the current error message does need some massaging around "stat failing etc."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
humble can you just return an error here?
question: |
Implementation is common for RBD and CephFS as long as we are dealing with RWX/RWO volume types. For RBD we should not need secrets as we can get the image name from the For CephFS need to think a little more and see if we can map the mount options to check the subvolume name maps to the image name, but needs more thought to understand if secrets are not needed or are mandatory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for rbd, if it's required to move it to the csi-common folder please do it as it will not affect the functionality
Thanks @Madhu-1 , RBD , I am still testing and will do the changes accordingly. @ShyamsundarR Can you review this in priority? Other teams are waiting for availability of this PR , so the urgency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly error handling comments, and some message nits.
@humblec thoughts?
|
||
isMnt, err := isMountPoint(targetPath) | ||
|
||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isMountPoint is always returning status.Error(codes.Internal, err.Error())
on errors, but the current error message does need some massaging around "stat failing etc."
pkg/cephfs/nodeserver.go
Outdated
|
||
if isMnt { | ||
klog.Infof("cephfs: volume %s is already mounted to %s", volID, targetPath) | ||
cephfsProvider := volume.NewMetricsStatFS(targetPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error check to ensure cephfsProvider
is non-nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not required. never going to happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, this is not gonna happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this because the signature does not have a return like (xxx, err) that we can assume this will not fail? Attempting to understand why this is not going to happen? Not a blocker for merge, but looking for information.
pkg/cephfs/nodeserver.go
Outdated
Available: available, | ||
Total: capacity, | ||
Used: used, | ||
Unit: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
use VolumeUsage_BYTES
instead of 1
and use VolumeUsage_INODES
instead of 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@Madhu-1 @ShyamsundarR comments are addressed. Please go through it, if something else is missing, we can followup. |
The travis is failing due to an issue at ceph container image #486 |
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Depends on
#kubernetes/kubernetes#76188
##439
Signed-off-by: Humble Chirammal hchiramm@redhat.com