-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup snapshots before volumes #289
Cleanup snapshots before volumes #289
Conversation
[1] accidentally swapped the cleanup order which represents a deviation to the previous behavior that not all CSI drivers may be able to handle. This change restores the original order. [1]: kubernetes-csi#261
Hi @timoreimann. Thanks for your PR. I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
Should they be able to handle this? @xing-yang ? If yes, then some explicit testing of both sequences (delete snapshot first vs. delete volume first) would be useful. Having said that, the PR itself makes sense. |
I asked myself the same question. Happy to help working on a dedicated test, though I also think that might be best done as a separate, follow-up PR. |
Actually most CSI drivers won't be able to handle this unless if they add some logic in their drivers to do soft delete. This is because for most storage systems, a snapshot is local and it depends on the volume. So I don't think we should add tests in csi-sanity that may fail most CSI drivers. |
And does the CSI standard and Kubernetes take that into account, i.e. is a CSI driver that expects a certain order compliant with the spec? What is the user experience in Kubernetes around this? |
Yes, CSI standard says that the volume should not be deleted if there is a dependent snapshot. So, order of deletion should be snapshot first and then the source volume. I am referring the spec at https://github.com/container-storage-interface/spec/blob/master/spec.md CSI plugins SHOULD treat volumes independent from their snapshots. If the Controller Plugin supports deleting a volume without affecting its existing snapshots, then these snapshots MUST still be fully operational and acceptable as sources for new volumes as well as appear on ListSnapshot calls once the volume has been deleted. When a Controller Plugin does not support deleting a volume without affecting its existing snapshots, then the volume MUST NOT be altered in any way by the request and the operation must return the FAILED_PRECONDITION error code and MAY include meaningful human-readable information in the status.message field. |
So the CSI spec defines the expected behavior. For csi-sanity that means that we need to have two variants of a "delete volume before snapshot" test: if the driver supports that, the expected outcome is "success". If not, the outcome is "FAILED_PRECONDITION". But as there is no capability that describes this (right?), we have to make this a tri-state config option: driver supports it/doesn't support it/unknown (= skip test). But as mentioned before, that's just an idea for a future test. The PR itself is fine, so: |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly, timoreimann The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There are couple of Tests failing even after this change. Tests which consume snapshot to create a new PVC or clone are failing because snapshot deletion is attempted before cleanup of new PVC or clone using the snapshot as source. Test 1 failure signature: Test 2 failure signature: CSI spec says following in case a snapshot is used as source for creating a new PVC and the snapshot delete is attempted:
|
@apurv15 looking at the diff from my PR #261, I do notice that the previous implementation of the first failed test you mentioned deleted the volume created from the snapshot before cleaning up the snapshot, and likewise on the second test deleted the volume created from the source volume. (You might need to expand the Do you think that might be reason why things don't work for you? |
@timoreimann
Your second check-in fixed the first issue by correcting the order of deletion of volume and snapshot but still the test which creates volume using the snapshot fails as snapshot is being deleted first before the volume created using the snapshot. Generic cleanup function might not be useful for all tests as there is subtle difference in the order of deletion of volumes based on tests. You could allow passing some hints to cleanup function to delete volumes in certain order for different tests. |
The implementation still allows to delete resources explicitly, which some tests already do. It was an oversight that the tests in question regressed, which is why I wanted to confirm with you that the current (deficient) test implementation I pointed at is, in fact, the culprit here. I'll submit a PR to make the necessary changes and ping you again. I'll run it against our driver for sure, it'd be great if you could do the same for further validation once the fix proposal is out. |
@timoreimann : Yes, we will test with the diff that you provide. Thanks. |
@timoreimann : I have not tracked new changes in csi-test. Did you submit new PR? |
@apurv15 sorry, I haven't gotten to it. I'll make sure to submit something tomorrow. |
What type of PR is this?
What this PR does / why we need it:
#261 accidentally swapped the cleanup order to deleting volumes before snapshots. This represents a deviation to the previous behavior that not all CSI drivers may be able to handle. This change restores the original order.
Does this PR introduce a user-facing change?:
/assign @pohly
@apurv15 thanks for reporting