-
Notifications
You must be signed in to change notification settings - Fork 1.6k
optimize leader election and abuse api server #837
optimize leader election and abuse api server #837
Conversation
orainxiong
commented
Jul 2, 2018
- Optimize leader election from per PV to per instance;
- Use informer cache rather than talking against api server.
2. Use informer cache rather than talking against api server.
more detals : kubernetes-csi/external-provisioner#68 |
pvName := ctrl.getProvisionedVolumeNameForClaim(claim) | ||
volume, err := ctrl.client.CoreV1().PersistentVolumes().Get(pvName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should keep this. It is copied from upstream https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/persistentvolume/pv_controller.go#L1390. The idea is to deliberately bypass the cache in case a PV was already created. In this case we are trading a kube API Get
to avoid an unnecessary storage backend Provision
which is usually worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I need to think more about this. Since this Get
is premised on the cache being stale, and I am not sure if we have the same problem of a stale cache as upstream has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, sorry for noise, I am okay with this change. I cannot think of a scenario where we need this, we are no longer using goroutinemap like upstream is either.
@@ -1118,10 +1035,25 @@ func (ctrl *ProvisionController) deleteVolumeOperation(volume *v1.PersistentVolu | |||
// Our check does not have to be as sophisticated as PV controller's, we can | |||
// trust that the PV controller has set the PV to Released/Failed and it's | |||
// ours to delete | |||
newVolume, err := ctrl.client.CoreV1().PersistentVolumes().Get(volume.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike above, this Get
I am okay with removing.
@@ -716,75 +698,6 @@ func (ctrl *ProvisionController) shouldDelete(volume *v1.PersistentVolume) bool | |||
return true | |||
} | |||
|
|||
// lockProvisionClaimOperation wraps provisionClaimOperation. In case other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue if we remove it without a replacement is we will be trading kube API abuse for storage backend API abuse, in case somebody is running more than one provisioner instance (which is way too easy given they can just e.g. change deployment # from 1 to 2). I.e. instead of racing to lock a PVC and spamming the kube API server they will race to talk to the storage backend then race to create a PV, so we need a replacement per-storageclass leader election if we want to remove this.
Also if we get rid of this leader election there is a lot more code that can be removed, which I will be happy to be rid of, but I will do it myself in a subsequent and I would appreciate a review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, I think there are two ways to implementation leader election:
- Pull off leader election logic from
external-storage
to haveexternal-provisioner
their own implement it. It will make the logic more simple on implementation, and the disadvantage is getting to impact on currentexternal-provisioner
. - Modify granularity of lock from
per-PVC
toper-Class
to avoid the race condition.
I have no idea which one is better. BTW, with my pleasure to review if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- will be much more difficult since there are many other provisioner besides csi external-provisioner depending on external-storage and they will each be burdened with copying the same complementation, even if the implementation is simple. 2 is better I think. The more opaque all of this is to library consumers the better. I don't like the idea of our little controller depending on some configmap for maintaining leader state but I also don't want to overload storage classes for that purpose like we are overloading pvcs at the moment. I will have more time to think/work on this in the coming days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you. From the implementation perspective, external-storage
could copy kind of leader election as existing operator
does.
Any new ideas, please let me know. Many thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, in which scenario current per-PVC
lock is bad? I like per-PVC
lock idea, because I can simply deploy more provisioners to scale. Each provisioner can work independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced client-side throttling is even a problem anymore after we introduced PVC work queues, since Provision
typically takes a lot of time. But I am never opposed to adding more construct functions.
I've opened #847 to discuss operator principles in general.
I think we need to bring this up with more people, e.g. in the next sig-storage meeting if there's time, otherwise I will ramble on indecisively forever. We should have this resolved AT LATEST before 1.12 release IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also write an email to sig-storage google group next week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wongma7 any chance you can join us to discuss this issue ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladimirvivien yes, where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wongma7
if you can, join the next CSI meeting (Wednesday 10am PST) for a quick report on the stuff you are working (LE/informers). Here is the link below:
VC on Zoom: https://zoom.us/j/614261834
Notes and agenda doc: https://docs.google.com/document/d/1-WmRYvqw1FREcD1jmZAOjC0jX6Gop8FMOzsdqXHFoT4/edit?usp=sharing