Skip to content
This repository has been archived by the owner on Oct 21, 2020. It is now read-only.

Conversation

orainxiong
Copy link

  1. Optimize leader election from per PV to per instance;
  2. Use informer cache rather than talking against api server.

2. Use informer cache rather than talking against api server.
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 2, 2018
@orainxiong orainxiong changed the title optimize leader election and optimize leader election and abuse api server Jul 2, 2018
@orainxiong
Copy link
Author

When 100 PVCs are created at same time, the CSI external-provisioner hammers the kube apiserver with requests and gets throttled, causing all sorts of issues.

more detals : kubernetes-csi/external-provisioner#68

pvName := ctrl.getProvisionedVolumeNameForClaim(claim)
volume, err := ctrl.client.CoreV1().PersistentVolumes().Get(pvName, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should keep this. It is copied from upstream https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/persistentvolume/pv_controller.go#L1390. The idea is to deliberately bypass the cache in case a PV was already created. In this case we are trading a kube API Get to avoid an unnecessary storage backend Provision which is usually worse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I need to think more about this. Since this Get is premised on the cache being stale, and I am not sure if we have the same problem of a stale cache as upstream has.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, sorry for noise, I am okay with this change. I cannot think of a scenario where we need this, we are no longer using goroutinemap like upstream is either.

@@ -1118,10 +1035,25 @@ func (ctrl *ProvisionController) deleteVolumeOperation(volume *v1.PersistentVolu
// Our check does not have to be as sophisticated as PV controller's, we can
// trust that the PV controller has set the PV to Released/Failed and it's
// ours to delete
newVolume, err := ctrl.client.CoreV1().PersistentVolumes().Get(volume.Name, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike above, this Get I am okay with removing.

@@ -716,75 +698,6 @@ func (ctrl *ProvisionController) shouldDelete(volume *v1.PersistentVolume) bool
return true
}

// lockProvisionClaimOperation wraps provisionClaimOperation. In case other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue if we remove it without a replacement is we will be trading kube API abuse for storage backend API abuse, in case somebody is running more than one provisioner instance (which is way too easy given they can just e.g. change deployment # from 1 to 2). I.e. instead of racing to lock a PVC and spamming the kube API server they will race to talk to the storage backend then race to create a PV, so we need a replacement per-storageclass leader election if we want to remove this.

Also if we get rid of this leader election there is a lot more code that can be removed, which I will be happy to be rid of, but I will do it myself in a subsequent and I would appreciate a review!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, I think there are two ways to implementation leader election:

  1. Pull off leader election logic from external-storage to have external-provisioner their own implement it. It will make the logic more simple on implementation, and the disadvantage is getting to impact on current external-provisioner.
  2. Modify granularity of lock from per-PVC to per-Class to avoid the race condition.

I have no idea which one is better. BTW, with my pleasure to review if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. will be much more difficult since there are many other provisioner besides csi external-provisioner depending on external-storage and they will each be burdened with copying the same complementation, even if the implementation is simple. 2 is better I think. The more opaque all of this is to library consumers the better. I don't like the idea of our little controller depending on some configmap for maintaining leader state but I also don't want to overload storage classes for that purpose like we are overloading pvcs at the moment. I will have more time to think/work on this in the coming days

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you. From the implementation perspective, external-storage could copy kind of leader election as existing operator does.

Any new ideas, please let me know. Many thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, in which scenario current per-PVC lock is bad? I like per-PVC lock idea, because I can simply deploy more provisioners to scale. Each provisioner can work independently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced client-side throttling is even a problem anymore after we introduced PVC work queues, since Provision typically takes a lot of time. But I am never opposed to adding more construct functions.

I've opened #847 to discuss operator principles in general.

I think we need to bring this up with more people, e.g. in the next sig-storage meeting if there's time, otherwise I will ramble on indecisively forever. We should have this resolved AT LATEST before 1.12 release IMO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also write an email to sig-storage google group next week

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wongma7 any chance you can join us to discuss this issue ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimirvivien yes, where?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wongma7
if you can, join the next CSI meeting (Wednesday 10am PST) for a quick report on the stuff you are working (LE/informers). Here is the link below:

VC on Zoom: https://zoom.us/j/614261834

Notes and agenda doc: https://docs.google.com/document/d/1-WmRYvqw1FREcD1jmZAOjC0jX6Gop8FMOzsdqXHFoT4/edit?usp=sharing

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/lib cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants