Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove DeleteDisk call in CreateDisk path #2009

Merged

Conversation

ConnorJC3
Copy link
Contributor

Is this a bug fix or adding new feature?

Bug fix

What is this PR about? / Why do we need it?

Today, when we time out waiting for the volume to create, we delete it. This is broken for several reasons:

  • It often doesn't even run, because the sidecar will have already closed the context elsewhere
  • Whenever it does run, it will trigger Possible race condition in EBS volume creation #1951 when the CO re-calls CreateVolume
  • It handles behavior that is the CO's responsibility according to the CSI spec:
  //    The CO is responsible for cleaning up volumes it provisioned
  //    that it no longer needs. If the CO is uncertain whether a volume
  //    was provisioned or not when a `CreateVolume` call fails, the CO
  //    MAY call `CreateVolume` again, with the same name, to ensure the
  //    volume exists and to retrieve the volume's `volume_id`

Also takes this opportunity to cleanup the extremely unclear error message.

What testing is done?

N/A

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 12, 2024
Signed-off-by: Connor Catlett <conncatl@amazon.com>
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 12, 2024
Copy link

Code Coverage Diff

This PR does not change the code coverage

Copy link
Member

@torredil torredil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the changes made in this PR and specifically the point which states that this logic handles behavior that is the CO's responsibility. My understanding is that having a DeleteDisk call in the CreateVolume path makes is a violation of the CSI spec.

Also worth highlighting that the external provisioner takes care of cleaning up orphaned volumes:

err = cleanupVolume(ctx, p, delReq, provisionerCredentials)
	if err != nil {
		capErr = fmt.Errorf("%v. Cleanup of volume %s failed, volume is orphaned: %v", capErr, pvName, err)
	}

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 12, 2024
@AndrewSirenko
Copy link
Contributor

AndrewSirenko commented Apr 12, 2024

Also worth highlighting that the external provisioner takes care of cleaning up orphaned volumes:

err = cleanupVolume(ctx, p, delReq, provisionerCredentials)
	if err != nil {
		capErr = fmt.Errorf("%v. Cleanup of volume %s failed, volume is orphaned: %v", capErr, pvName, err)
	}

I'm not sure if this is true for EBS CSI Driver today, due to Kubernetes/External-Provisioner not knowing the volume-id of the orphaned volume (and therefore not knowing what EBS volume to delete).

Some other CSI Drivers store volume-id in PVC, because they can pass in a volume ID to their CreateVolume equivalent in their backend, which lets Kubernetes know the volume-id so K8s can be sure to delete volume if volume was orphaned.

Alas, because Kubernetes and EBS CSI Driver only know volume's Idempotency token (hash of PVC name), and we cannot call EC2 DeleteVolume with just an idempotency token today, we do risk leaking/orphaning a volume (as an edge case).

Can draw out a diagram later.

Either way I agree with the PR because this edge case mentioned above is NOT solved via the code @ConnorJC3 is deleting anyway, due to the context typically being cancelled by the time we reach this DeleteVolume call. Furthermore this code today can lead to stuck pods IF sidecar timeout is very large, createVolume succeeds, but waitForVolume times out (due to today's 1 min limit), so I agree it is safer to delete this code today and think of something better.

I do agree with first half of @torredil's statement though.

@torredil
Copy link
Member

@AndrewSirenko Sorry that last bit was poorly phrased and confusing. Also worth highlighting that the external provisioner takes care of cleaning up volumes is a bit clearer.

Alas, because Kubernetes and EBS CSI Driver only know volume's Idempotency token (hash of PVC name), and we cannot call EC2 DeleteVolume with just an idempotency token today, we do risk leaking/orphaning a volume (as an edge case).

Can you clarify this point - what is the specific scenario in which the driver needs to call EC2's DeleteVolume in the CreateVolume path (CSI spec violation), where a volume ID is not available? I understand this to mean the RPC is not idempotent.

The correct behavior for cleaning up volumes is as follows:

  //    The CO is responsible for cleaning up volumes it provisioned
  //    that it no longer needs. If the CO is uncertain whether a volume
  //    was provisioned or not when a `CreateVolume` call fails, the CO
  //    MAY call `CreateVolume` again, with the same name, to ensure the
  //    volume exists and to retrieve the volume's `volume_id` (unless
  //    otherwise prohibited by "CreateVolume Errors").

In practice, this means failing the request and letting the provisioner retry as opposed to attempting to delete the volume, which is what this PR steers us towards - I think we all agree on that point.

Once Kubernetes (the provisioner) has retrieved the volume ID from a CreateVolume call, it can proceed with DeleteVolume where the volume ID is passed in via the DeleteVolumeRequest created by the provisioner.

this edge case mentioned above is NOT solved via the code @ConnorJC3 is deleting anyway

Agree 👍

I would like to learn more about this edge case and look forward to the diagram.

@AndrewSirenko
Copy link
Contributor

AndrewSirenko commented Apr 14, 2024

Also worth highlighting that the external provisioner takes care of cleaning up volumes is a bit clearer.

Yep, I agree with this phrasing.

I only disagreed with the external provisioner cleaning up orphaned volumes via the code from the cleanupVolume snippet you linked to (which doesn't do anything in the aws-ebs-csi-driver case because EBS requires a volume-id today to delete, which external provisioner won't have access to until a successfully returned retry CreateVolume call).

Can you clarify this point - what is the specific scenario in which the driver needs to call EC2's DeleteVolume in the CreateVolume path (CSI spec violation), where a volume ID is not available?

There is no scenario. We all agree, the driver should not call EC2 DeleteVolume in CreateVolume RPC. My sentence was not arguing against that.

I'm saying that until EBS provides a way to call EC2 DeleteVolume with the idempotency token used for EC2 CreateVolume (instead of the volume-id), there may still be an edge case of volume leaks because the cleanupVolume snippet can't work. This edge case could occur if a cluster operator deletes a PVC object that hadn't yet led to a CreateVolume RPC success (due to a context timeout), but the volume was created in AWS backend.

Sidenote: Looking more closely at cleanupVolume, it only triggers if a volume was created with less than the desired capacity, so I'm not sure it's actually relevant to the discussion even if we had a way of deleting a volume without volume-id.

@AndrewSirenko
Copy link
Contributor

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndrewSirenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2024
@k8s-ci-robot k8s-ci-robot merged commit 1b242f8 into kubernetes-sigs:master Apr 15, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants