Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrated PVs are provisioned when migration is off #340

Closed
jsafrane opened this issue Sep 12, 2019 · 7 comments · Fixed by #341
Closed

Migrated PVs are provisioned when migration is off #340

jsafrane opened this issue Sep 12, 2019 · 7 comments · Fixed by #341
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@jsafrane
Copy link
Contributor

In AWS cluster with all alpha features off and AWS EBS CSI driver installed, the CSI driver provisions also PVs for in-tree storage class.

The CSI driver should provision only volumes for the driver (ebs.csi.aws.com) and not for kubernetes.io/aws-ebs, because Kubernetes in-tree volume plugin provisions a volume too.

Storage class:

  kind: StorageClass
  metadata:
    annotations:
      storageclass.kubernetes.io/is-default-class: "true"
    labels:
      addonmanager.kubernetes.io/mode: EnsureExists
    name: gp2
  parameters:
    type: gp2
  provisioner: kubernetes.io/aws-ebs
  reclaimPolicy: Delete
  volumeBindingMode: Immediate

Claim:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi

CSI provisioner log:

W0912 10:58:09.969218       1 deprecatedflags.go:53] Warning: option provisioner="ebs.csi.aws.com" is deprecated and has no effect
I0912 10:58:09.969310       1 feature_gate.go:226] feature gates: &{map[Topology:true]}
I0912 10:58:09.969333       1 csi-provisioner.go:98] Version: v1.3.0-0-g27750ab1
I0912 10:58:09.969348       1 csi-provisioner.go:112] Building kube configs for running in cluster...
I0912 10:58:09.982199       1 connection.go:151] Connecting to unix:///var/lib/csi/sockets/pluginproxy/csi.sock
I0912 10:58:09.984244       1 connection.go:261] Probing CSI driver for readiness
I0912 10:58:09.984266       1 connection.go:180] GRPC call: /csi.v1.Identity/Probe
[snip]
/csi.v1.Identity/GetPluginInfo
[snip]
I0912 10:58:09.987730       1 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginCapabilities
[snip]
I0912 10:58:09.989520       1 connection.go:180] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
[snip]
I0912 10:58:09.994484       1 csi-provisioner.go:178] Supports migration from in-tree plugin: kubernetes.io/aws-ebs
I0912 10:58:09.996177       1 controller.go:621] Using saving PVs to API server in background
I0912 10:58:09.996352       1 controller.go:769] Starting provisioner controller ebs.csi.aws.com_ebs-csi-controller-0_354c83ff-d54c-11e9-9216-0242ac110003!
I0912 10:58:09.996452       1 volume_store.go:90] Starting save volume queue
[snip informer startup]
I0912 10:58:26.002204       1 controller.go:1196] provision "default/myclaim" class "gp2": started
I0912 10:58:26.004432       1 controller.go:367] translating storage class for in-tree plugin kubernetes.io/aws-ebs to CSI
I0912 10:58:26.006641       1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"myclaim", UID:"fc52e9df-f1ad-480c-8530-d8ed03a0d96e", APIVersion:"v1", ResourceVersion:"361", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/myclaim"
I0912 10:58:26.020995       1 controller.go:471] CreateVolumeRequest {Name:pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e CapacityRange:required_bytes:524288000  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:requisite:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"us-east-1d" > > preferred:<segments:<key:"topology.ebs.csi.aws.com/zone" value:"us-east-1d" > >  XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0912 10:58:26.021170       1 connection.go:180] GRPC call: /csi.v1.Controller/CreateVolume
I0912 10:58:26.021179       1 connection.go:181] GRPC request: {"accessibility_requirements":{"preferred":[{"segments":{"topology.ebs.csi.aws.com/zone":"us-east-1d"}}],"requisite":[{"segments":{"topology.ebs.csi.aws.com/zone":"us-east-1d"}}]},"capacity_range":{"required_bytes":524288000},"name":"pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e","parameters":{"type":"gp2"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
I0912 10:58:32.506774       1 connection.go:183] GRPC response: {"volume":{"accessible_topology":[{"segments":{"topology.ebs.csi.aws.com/zone":"us-east-1d"}}],"capacity_bytes":1073741824,"volume_id":"vol-0845c25209f0083ce"}}
I0912 10:58:32.507913       1 connection.go:184] GRPC error: <nil>
I0912 10:58:32.507929       1 controller.go:523] create volume rep: {CapacityBytes:1073741824 VolumeId:vol-0845c25209f0083ce VolumeContext:map[] ContentSource:<nil> AccessibleTopology:[segments:<key:"topology.ebs.csi.aws.com/zone" value:"us-east-1d" > ] XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0912 10:58:32.507999       1 controller.go:596] successfully created PV {GCEPersistentDisk:nil AWSElasticBlockStore:&AWSElasticBlockStoreVolumeSource{VolumeID:vol-0845c25209f0083ce,FSType:ext4,Partition:0,ReadOnly:false,} HostPath:nil Glusterfs:nil NFS:nil RBD:nil ISCSI:nil Cinder:nil CephFS:nil FC:nil Flocker:nil FlexVolume:nil AzureFile:nil VsphereVolume:nil Quobyte:nil AzureDisk:nil PhotonPersistentDisk:nil PortworxVolume:nil ScaleIO:nil Local:nil StorageOS:nil CSI:nil}
I0912 10:58:32.508123       1 controller.go:1278] provision "default/myclaim" class "gp2": volume "pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e" provisioned
I0912 10:58:32.508142       1 controller.go:1295] provision "default/myclaim" class "gp2": succeeded
I0912 10:58:32.508150       1 volume_store.go:147] Saving volume pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e
I0912 10:58:32.512051       1 volume_store.go:150] Volume pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e saved
I0912 10:58:32.512095       1 controller.go:979] Final error received, removing PVC fc52e9df-f1ad-480c-8530-d8ed03a0d96e from claims in progress

Provisioner version: v1.3.0-0-g27750ab1
Kubernetes: 1.16 RC1 (but IMO it's not related)

@ddebroy @msau42 @davidz627, IMO this is pretty serious.

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 12, 2019
@msau42
Copy link
Collaborator

msau42 commented Sep 12, 2019

/assign @ddebroy @davidz627

I see a few ways to fix this:

  • Add migration feature gate to provisioner. This means that k8s migration feature gate + driver feature gate must be set together.
  • Add one more annotation to PVC that pv-controller sets to indicate migration is turned on, and external-provisioner checks that too

@msau42
Copy link
Collaborator

msau42 commented Sep 12, 2019

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 12, 2019
@jsafrane
Copy link
Contributor Author

Add one more annotation to PVC that pv-controller sets to indicate migration is turned on, and external-provisioner checks that too

PV controller would need to remove the annotation when user switches the flag off. Feasible, but perhaps racy, external provisioner may be already provisioning a PV.

@jsafrane
Copy link
Contributor Author

Hmm, what about reusing volume.beta.kubernetes.io/storage-provisioner? PV controller already sets it when migration is on.

And the race between in-tree and external provisioners when user disables the migration can be fixed as part of kubernetes-sigs/sig-storage-lib-external-provisioner#55 (there IMO always be some race).

@jsafrane
Copy link
Contributor Author

When the migration is enabled in the PV controller, we already set volume.beta.kubernetes.io/storage-provisioner: <csi driver name>.

https://github.com/kubernetes/kubernetes/blob/b3c4bdea496c0e808ad761d6c387fcd6838dea99/pkg/controller/volume/persistentvolume/pv_controller.go#L1574

So why does the provisioner checks for volume.beta.kubernetes.io/storage-provisioner: <in-tree plugin name>?

@davidz627
Copy link
Contributor

davidz627 commented Sep 12, 2019

So why does the provisioner checks for volume.beta.kubernetes.io/storage-provisioner:

I don't think it checks for that. It needs to "know" the in-tree plugin name so that it can pick up the in-tree PVC's and process them in the queue (since it always needs to look at the PVCs to check whether they are migrated or not).

The one thing that was missing IMO was ignoring the volume if volume.beta.kubernetes.io/storage-provisioner: <in-tree plugin name> because that's the signal that we use from pv-controller whether it was migrated from Kubernetes side or not. I've submitted a PR to fix this

@ddebroy
Copy link
Collaborator

ddebroy commented Sep 12, 2019

This bug was introduced in https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/pull/34/files . Instead of loosening/replacing the ==/!= check with knownProvisioner for both the class.provisioner and PVC's annStorageProvisioner, I should have modified it only for the class.provisioner check. David's PR tightens back that check at the CSI provisioner level for PVC's annStorageProvisioner which seems ideal.

kbsonlong pushed a commit to kbsonlong/external-provisioner that referenced this issue Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants