Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New OpenZFS port vcruntime and pool create #7

Open
dima333750 opened this issue Feb 14, 2021 · 12 comments
Open

New OpenZFS port vcruntime and pool create #7

dima333750 opened this issue Feb 14, 2021 · 12 comments

Comments

@dima333750
Copy link

Hi
I tested it https://www.lundman.net/OpenZFSOnWindows-debug-2.0.0-15-ga96cc4a70-dirty.exe
I found a couple of problems:

  1. zpool at startup swears at the absence of
    vcruntime140d.dll ucrtbased.dll
    I managed to get around this problem I think I use copies of my files vcruntime140.dll ucrtbase.dll by renaming them to vcruntime140d.dll ucrtbased.dll
  2. When you try to create a pool of zpool.exe create tank PHYSICALDRIVE4
    I get
Expanded path to '\\?\PHYSICALDRIVE4'
working on dev '\\?\PHYSICALDRIVE4'
this code assume FS is on partition 1 - tell lundman
cannot create 'tank': invalid argument for this pool operation

The disk definitely exists, but it doesn't see it
I also tried

fsutil file createnew G:\poolfile.bin 200000000
zpool create daddy \\?\G:\poolfile.bin

I get

cannot open '\\?\G:\poolfile.bin': no such device in \\?\
must be a full path or shorthand device name
@lundman
Copy link

lundman commented Feb 14, 2021

I have already fixed vcruntime, and working on the pool create fix - broke it when I corrected import. Should have it better tomorrow I hope.

@dima333750
Copy link
Author

Okay, I'll look forward to it.

@dima333750
Copy link
Author

dima333750 commented Feb 16, 2021

Hi
I tested it
https://www.lundman.net/OpenZFSOnWindows-debug-2.0.0-24-gdcdf6292f.exe

  1. The vc runtime dependency has remained, and it requires:
    vcruntime140d.dll ucrtbased.dll

  2. I was able to create a pool:
    zpool create -O compress=lz4 -O dedup=on -O casesensitivity=insensitive -O atime=off -o ashift=12 dedu PHYSICALDRIVE4

But when I started writing files I got a BSOD SYSTEM_SERVICE_EXCEPTION (3b)
Interestingly, I repeated 2-3 times and the BSOD occurred when writing 400-500mb
I attach the report
zfs_v2.txt

@lundman
Copy link

lundman commented Feb 16, 2021

OK let's see:

 OpenZFS!zfs_vnop_lookup_impl+0x14bc [C:\src\openzfs\module\os\windows\zfs\zfs_vnops_windows.c @ 827] 

https://github.com/openzfsonwindows/openzfs/blob/windows/module/os/windows/zfs/zfs_vnops_windows.c#L827

	if (stream_name != NULL && vp != NULL) {
		// Here, we will release dvp, and attempt to open the xattr dir.
		// xattr dir will be the new dvp. Then we will look for streamname
		// in xattrdir, and assign vp.
		if (dvp_no_rele)
			VN_RELE(dvp);
		// Create the xattrdir only if we are to create a new entry
		if (error = zfs_get_xattrdir(VTOZ(vp), &dzp, cr, CreateFile ? CREATE_XATTR_DIR : 0)) {
==>			VN_RELE(vp);

Usually, "==>" points to what would-have next run, so we die with the prior line. But since we don't make it into zfs_get_xattrdir() that leaves VTOZ. But the if above confirms vp is not NULL. Odd.

vcruntime: Guess I need to find a dependency walker and see why it gets pulled in.

@lundman lundman transferred this issue from openzfsonwindows/ZFSin Feb 17, 2021
@lundman
Copy link

lundman commented Feb 17, 2021

Hmm yes, it's pulling it in;:

Screenshot 2021-02-17 101807

@lundman
Copy link

lundman commented Feb 17, 2021

Heh looks like I turned vcruntime off twice for C++ though, so, yep. 0bfe2d9

@lundman
Copy link

lundman commented Feb 18, 2021

If you have time, can we check the stream panic with https://www.lundman.net/OpenZFSOnWindows-debug-2.0.0-26-g2f7bbd39c.exe -attempt to fix issue

@dima333750
Copy link
Author

Hello

  1. Dependency vcruntime defeated
  2. BSOD remains, I attach a report, but the problem seems to be unchanged
    zfs_v2_2.txt

@dima333750
Copy link
Author

Hello
I stretched
https://www.lundman.net/OpenZFSOnWindows-debug-2.0.0-26-g2f7bbd39c-dirty.exe
The problem remains, but there seems to be some change
zfs_v2_3.txt

@lundman
Copy link

lundman commented Feb 22, 2021

What about today's version

@dima333750
Copy link
Author

dima333750 commented Mar 11, 2021

Hi My computer broke down and could not participate in testing, but I am back in service
Stretched https://www.lundman.net/OpenZFSOnWindows-debug-2.0.0-29-g35a90247c.exe
zfs_v2_4.txt

I got an error again, but it seems I started to understand how to reproduce the problem
you need to create a pool

zpool create -O compress=lz4 -O dedup=on -O casesensitivity=insensitive -O atime=off -o ashift=12 dedu PHYSICALDRIVE1
or
zpool create -O dedup=on -O casesensitivity=insensitive -O atime=off -o ashift=12 dedu PHYSICALDRIVE1

And copy to disk zip 7z or other compressed data (different and a lot of 100 mb +) and the problem manifests itself immediately.

I also noticed that when a pool is created, a folder with its name appears on the C: drive, and it is impossible to create a pool with this name again without deleting this folder. I don't know if it's a mistake or not ;)

lundman pushed a commit that referenced this issue Mar 3, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
lundman pushed a commit that referenced this issue Mar 3, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
@sskras
Copy link

sskras commented Oct 31, 2023

@dima333750, are you still at it?

Maybe the BSOD got fixed in the meantime (and the issue could be closed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants