feat(zfspv): handling unmounted volume #78

pawanpraka1 · 2020-04-09T15:30:16Z

Cherry-pick to v0.6.x : #76
There can be cases where openebs namespace has been accidentally deleted. If that happens, the volume CRs deletion will be triggered. Volume CR deletion will trigger dataset deletion process. Prior to the actual deletion of the data, the driver will attempt to do the following:

Unmount the dataset(not the case with zvol as it will be unmounted via NodeUnPublish) Unmounting the dataset will Set the mount="no"
Delete the zvol or dataset
But since the volume is actively consumed by a pod, the destroy will fail, as the volume is busy. The zvols continue to operate. However with datasets, it is umounted resulting in the setting of mount="no", but then delete will fail.

Now, there are two actions that the user can take:
(a) Continue to clean-up, so that setup can be re-created
(b) Reinstall openebs and try to create volumes and point to the underlying datasets.

In case (a), the user will have to go delete the application pods and then start deleting the PVCs. However, the PV deletion will fail, because the steps to clean up do not find that volume is mounted and will be aborted.

Let us assume case (b), and prior to actually reinstalling, as the volumes are still intact, applications are expected to continue to access the data.

However, if a node restart occurs, due to the mount being set to no, the pods will not be able to access the volume. The data stored by the pod will not persist as it is not backed by persistence storage.

To recover from the partial clean up steps. the following needs to be done on each of the nodes:

zfs get mounted : check if there is any unmounted dataset with this option as "no".
For all the datasets that showed mounted as no, do the following:
zfs mount
The above commands will result in mounting the dataset.
Here in this PR :

automating the manual steps performed above to check that volume can be mounted, even if a manual operation left it in a non-mountable state.
helping with the case (a), by into NodeUnPublish operation - that continues to destroy in whatever state the volume may be in.
add helpful debug messages.
Signed-off-by: Pawan pawan@mayadata.io

There can be cases where openebs namespace has been accidently deleted (Optoro case: https://mdap.zendesk.com/agent/tickets/963), There the driver attempted to destroy the dataset which will first umount the dataset and then try to destroy it, the destroy will fail as volume is busy. Here, as mentioned in the steps to recover, we have to manually mount the dataset ``` 6. The driver might have attempted to destroy the volume before going down, which sets the mount as no(this strange behavior on gke ubuntu 18.04), we have to mount the dataset, go to the each node and check if there is any unmounted volume zfs get mounted if there is any unmounted dataset with this option as "no", we should do the below :- mountpath=zfs get -Hp -o value mountpoint <dataset name> zfs set mountpoint=none zfs set mountpoint=<mountpath> this will set the dataset to be mounted. ``` So in this case the volume will be unmounted and still mountpoint will set to the mountpath, so if application pod is deleted later on, it will try to mount the zfs dataset, here just setting the `mountpoint` is not sufficient, as if we have unmounted the zfs dataset (via zfs destroy in this case), so we have to explicitely mount the dataset **otherwise application will start running without any persistence storage**. Here automating the manual steps performed to resolve the problem, we are checking in the code that if zfs dataset is not mounted after setting the mountpoint property, attempt to mount it. This is not the case with the zvol as it does not attempt to unmount it, so zvols are fine. Also NodeUnPublish operation MUST be idempotent. If this RPC failed, or the CO does not know if it failed or not, it can choose to call NudeUnPublishRequest again. So handled this and returned successful if volume is not mounted also added descriptive error messages at few places. Signed-off-by: Pawan <pawan@mayadata.io>

codecov-io · 2020-04-09T15:42:02Z

Codecov Report

Merging #78 into v0.6.x will not change coverage by %.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           v0.6.x      #78   +/-   ##
=======================================
  Coverage   23.57%   23.57%           
=======================================
  Files          14       14           
  Lines         475      475           
=======================================
  Hits          112      112           
  Misses        362      362           
  Partials        1        1

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6033789...3a1a8e7. Read the comment docs.

pawanpraka1 added the enhancement Add new functionality to existing feature label Apr 9, 2020

pawanpraka1 added this to the v0.6.0 milestone Apr 9, 2020

pawanpraka1 requested a review from kmova April 9, 2020 15:30

kmova merged commit 91ae9e4 into v0.6.x Apr 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(zfspv): handling unmounted volume #78

feat(zfspv): handling unmounted volume #78

pawanpraka1 commented Apr 9, 2020 •

edited

Loading

codecov-io commented Apr 9, 2020 •

edited

Loading

feat(zfspv): handling unmounted volume #78

feat(zfspv): handling unmounted volume #78

Conversation

pawanpraka1 commented Apr 9, 2020 • edited Loading

codecov-io commented Apr 9, 2020 • edited Loading

Codecov Report

pawanpraka1 commented Apr 9, 2020 •

edited

Loading

codecov-io commented Apr 9, 2020 •

edited

Loading