Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The great unmount challenge #1

Open
lundman opened this issue Feb 1, 2021 · 10 comments
Open

The great unmount challenge #1

lundman opened this issue Feb 1, 2021 · 10 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@lundman
Copy link

lundman commented Feb 1, 2021

Currently, when we try to unmount a dataset (explicitly or as part of export) we need to ask all open files to flush and close.
However, we must be missing a step as it turns out to be a challenge.

In particular, the WPSettings.dat file, created very early on in the mounting process.

Because we never receive an IRP_MJ_CLOSE for the opened handle, it has section cache associated with it, in vnode_flushcache() it still has a DataSectionObject set. (not-NULL). We can not proceed without this released or we will BSOD in the future.

The winbtrfs project does call FSRTL_VOLUME_DISMOUNT which definitely sounds interesting, could this be the missing piece? Or do they only call it for the virtual disk volumes (and not mounted datasets)
(@maharmstone if you happen to see this, perhaps you can confirm?)

However, if I just call it, I get an error, as in:

		root_file = IoCreateStreamFileObject(NULL, zmo->deviceObject);
		ntstatus = FsRtlNotifyVolumeEvent(root_file, FSRTL_VOLUME_DISMOUNT);
		ObDereferenceObject(root_file);

Which happens also if I call FSRTL_VOLUME_MOUNT at mount time. Then by luck, I noticed that if I call (at least) these line
(which btrfs also calls)

	status = IoReportDetectedDevice(
		DriverObject,
		InterfaceTypeUndefined,
		0xFFFFFFFF, // 0
		0xFFFFFFFF, // 0
		NULL,
		NULL,
		FALSE,
		&pnpDeviceObject);
	if (IoAttachDeviceToDeviceStack(pnpDeviceObject, DeviceObject) != NULL) {

Suddenly, I can send FSRTL_VOLUME_MOUNT without error (and presumably FSRTL_VOLUME_DISMOUNT) but
frustratingly, now the dataset mount no longer works at all! Explorer will show it as an empty device and if you
click on it you get "invalid function" (and not from calling my dispatch).

Again, I suspect I am missing something. Sure it works without IoReportDetectedDevice / IoAttachDeviceToDeviceStack but I think that is a fluke, and I really should be calling them.

But by calling those functions, I also need to do something else to make it think it is mounted (and to call my dispatch). Then, perhaps DISMOUNT will work, and WPSettings.dat will be closed.

@lundman lundman added bug Something isn't working help wanted Extra attention is needed labels Feb 1, 2021
@maharmstone
Copy link

You should only be calling IoReportDetectedDevice once, for your bus device, not for every volume available (which from a cursory glance it looks like you're doing). Your bus device then reports its PDO children through IRP_MN_QUERY_DEVICE_RELATIONS, AddDevice gets called by pnpmgr to create device objects for each one, mountmgr assigns it a drive letter once it realizes it's a volume, and it gets mounted the first time someone accesses it.

Basically have a look at https://github.com/maharmstone/btrfs/blob/master/src/pnp.c. I suspect that once you've got PNP sorted out you'll be a lot closer to solving your issue.

@lundman
Copy link
Author

lundman commented Feb 1, 2021

Ah - thought I could avoid the pnp thing, but perhaps that's why I'm in trouble. Thanks for the reply, I'll give it a go.

@maharmstone
Copy link

No worries. If it helps, it's not all that useful to talk about unmounting on Windows - you can do it, but the kernel will mount it again in short order. If you want a device to go away for whatever reason, you need to be thinking rather about getting the underlying PDO to be removed.

@lundman
Copy link
Author

lundman commented Feb 2, 2021

OK so, I do set and handle PNP irps, it is just not in its own file - all handled in the big dispatcher. I had skipped the notification callbacks, assuming it just tells me about new things coming and going - which I don't believe comes into play with the unmounting issue. But I've thrown in that code now, just in case..

Using the word unmount is just used as a familiar terminology for me.. The "what ever is needed to ask windows to close everything so I can remove the device".

Going over btrfs and my code again, It looks like;

  • I use IOCTL_MOUNTMGR_VOLUME_ARRIVAL_NOTIFICATION, but the documentation seems to indicate that I shouldn't need to, if my pnp is correct.
  • There is no IOCTL_MOUNTMGR_VOLUME_REMOVAL_NOTIFICATION, so definitely need something else.
  • I never forward irps. In some cases, btrfs calls IoCallDriver() to send the irp lower - I haven't.. figured it is just sending to myself - and hopefully missing this is not involved with unmounting.

about getting the underlying PDO to be removed.

I should focus on that. btrfs starts with FSRTL_VOLUME_DISMOUNT which is why I was trying to get that to work. But possibly it suggests my PDO and/or VPD is not correctly configured - I am unfamiliar with what they do, is there a util to test/display/verify/show-devices so I can confirm I have it correct? Or compared it against btrfs?

@imtiazdc
Copy link

imtiazdc commented Feb 5, 2021

@lundman Currently uninstall and reboot doesn't delete the driver from C:\Windows\system32\drivers. Also, it continues to show the controller in device manager. Are those issues related to this unmount issue? Should we file a different bug for it for tracking?

@lundman
Copy link
Author

lundman commented Feb 5, 2021

Ah let's see.. if you do not create a pool then unloading the module "should" work, and if it doesn't, we should fix it.
But if you create/import any pool, then you can't - due to the unmount issue.

@imtiazdc
Copy link

imtiazdc commented Feb 5, 2021

Clean install, uninstall, reboot on a physical machine shows OpenZFS controller in device manager and driver in C:\Windows\system32\drivers. No zpool created.

@lundman
Copy link
Author

lundman commented Feb 5, 2021

There was a fix from datacore, removing the symlink - I will double check it made it in..

lundman pushed a commit that referenced this issue Feb 16, 2022
`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and
`pool_specified` to `import_pools()`.  If `pool_specified==FALSE`, the
`argv[]` arguments are not used.  However, these values may be off the
end of the `argv[]` array, so loading them could dereference unmapped
memory.  This error is reported by the asan build:

```
=================================================================
==6003==ERROR: AddressSanitizer: heap-buffer-overflow
READ of size 8 at 0x6030000004a8 thread T0
    #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796
    #1 0x562a078858c5 in main zpool_main.c:10709
    #2 0x7f5115231bf6 in __libc_start_main
    #3 0x562a07885eb9 in _start

0x6030000004a8 is located 0 bytes to the right of 24-byte region
allocated by thread T0 here:
    #0 0x7f5116ac6b40 in __interceptor_malloc
    #1 0x562a07885770 in main zpool_main.c:10699
    #2 0x7f5115231bf6 in __libc_start_main
```

This commit passes NULL for these arguments if they are off the end
of the `argv[]` array.

Reviewed-by: George Wilson <gwilson@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes openzfs#12339
lundman pushed a commit that referenced this issue Oct 16, 2022
Before this patch, in zfs_domount, if zfs_root or d_make_root fails, we
leave zfsvfs != NULL. This will lead to execution of the error handling
`if` statement at the `out` label, and hence to a call to
dmu_objset_disown and zfsvfs_free.

However, zfs_umount, which we call upon failure of zfs_root and
d_make_root already does dmu_objset_disown and zfsvfs_free.

I suppose this patch rather adds to the brittleness of this part of the
code base, but I don't want to invest more time in this right now.
To add a regression test, we'd need some kind of fault injection
facility for zfs_root or d_make_root, which doesn't exist right now.
And even then, I think that regression test would be too closely tied
to the implementation.

To repro the double-disown / double-free, do the following:
1. patch zfs_root to always return an error
2. mount a ZFS filesystem

Here's the stack trace you would see then:

  VERIFY3(ds->ds_owner == tag) failed (0000000000000000 == ffff9142361e8000)
  PANIC at dsl_dataset.c:1003:dsl_dataset_disown()
  Showing stack for process 28332
  CPU: 2 PID: 28332 Comm: zpool Tainted: G           O      5.10.103-1.nutanix.el7.x86_64 #1
  Call Trace:
   dump_stack+0x74/0x92
   spl_dumpstack+0x29/0x2b [spl]
   spl_panic+0xd4/0xfc [spl]
   dsl_dataset_disown+0xe9/0x150 [zfs]
   dmu_objset_disown+0xd6/0x150 [zfs]
   zfs_domount+0x17b/0x4b0 [zfs]
   zpl_mount+0x174/0x220 [zfs]
   legacy_get_tree+0x2b/0x50
   vfs_get_tree+0x2a/0xc0
   path_mount+0x2fa/0xa70
   do_mount+0x7c/0xa0
   __x64_sys_mount+0x8b/0xe0
   do_syscall_64+0x38/0x50
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Co-authored-by: Christian Schwarz <christian.schwarz@nutanix.com>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Closes openzfs#14025
lundman pushed a commit that referenced this issue Mar 3, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
lundman pushed a commit that referenced this issue Mar 3, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
@nils-trustnode
Copy link

@lundman Is there any progress after all these years? At the moment there is a reboot required because of the suspended state after exporting a pool. This is not suitable for daily use or server constellations.

@lundman
Copy link
Author

lundman commented Jul 1, 2024

Yes and no - lots of work has gone into unmount, but it is also not perfect. I've not had any suspended-state from unmounts in a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants