Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool import not working. Hence unable to make zdb work. #14

Closed
datacore-skumar opened this issue Apr 1, 2021 · 19 comments
Closed

zpool import not working. Hence unable to make zdb work. #14

datacore-skumar opened this issue Apr 1, 2021 · 19 comments

Comments

@datacore-skumar
Copy link

I am trying to make zdb work by porting changes from ZFSin.
Looks like zpool import is broken on Windows.
@lundman Is this a known issue? and are you working on it?

Z:\openzfs\out\install\x64-Debug-2\bin>zpool export tank1
zunmount(tank1,E:\ ) running
zunmount(tank1,E:\ ) returns 0

Z:\openzfs\out\install\x64-Debug-2\bin>zpool status
no pools available

Z:\openzfs\out\install\x64-Debug-2\bin>zpool import tank1
path '\\?\scsi#disk&ven_vbox&prod_harddisk#4&2617aeae&0&020000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive1'
read partitions ok 0
asking libefi to read label
EFI read OK, max partitions 9
    part 0:  offset 800:    len 63fb000:    tag: 4    name: 'zfs-0000506a00004abb'
    part 8:  offset 63fb800:    len 4000:    tag: b    name: ''
path '\\?\scsi#disk&ven_vbox&prod_harddisk#4&2617aeae&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive0'
read partitions ok 4
    mbr 0: type 7 off 0x100000 len 0x22500000
    mbr 1: type 7 off 0x22600000 len 0xc5d900000
    mbr 2: type 0 off 0x0 len 0x0
    mbr 3: type 0 off 0x0 len 0x0
asking libefi to read label
path '\\?\scsi#disk&ven_vbox&prod_harddisk#4&2617aeae&0&030000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive2'
read partitions ok 2
    gpt 0: type 5872d3c0 off 0x100000 len 0xc7f600000
    gpt 1: type 5872d3c0 off 0xc7f700000 len 0x800000
asking libefi to read label
EFI read OK, max partitions 9
    part 0:  offset 800:    len 63fb000:    tag: 4    name: 'zfs-00000f17000073da'
    part 8:  offset 63fb800:    len 4000:    tag: b    name: ''
Processing volume '\\?\Volume{9bb5203f-0000-0000-0000-100000000000}'
Processing volume '\\?\Volume{9bb5203f-0000-0000-0000-602200000000}'
Processing volume '\\?\Volume{be19aaed-3fd1-11eb-9067-806e6f6e6963}'
working on dev '#1048576#53676605440#\\?\scsi#disk&ven_vbox&prod_harddisk#4&2617aeae&0&020000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
setting path here '/dev/physicaldrive1'
setting physpath here '#1048576#53676605440#\\?\scsi#disk&ven_vbox&prod_harddisk#4&2617aeae&0&020000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
cannot import 'tank1': one or more devices is currently unavailable

Z:\openzfs\out\install\x64-Debug-2\bin>zpool status
no pools available

@lundman
Copy link

lundman commented Apr 1, 2021

I actually thought it was working, unless you found a cornercase. One thing that could be worth checking, is "offline" of disks, there was an issue where we left disks offline after export. Flipping them Online in diskmanager would fix it.

@datacore-skumar
Copy link
Author

Thanks for the prompt response!
We are not seeing any disk being made offline after creating zpool, as we are not creating any zvol.
Simple zpool export and then import is failing. So there is no zvol, no IO.

FYI
For a zpool named tank2

Z:\openzfs\out\install\x64-Debug-2\bin>zdb

Output:

tank2:
    version: 5000
    name: 'tank2'
    state: 0
    txg: 4
    pool_guid: 70254533809252557
    errata: 0
    hostid: 2013764521
    hostname: 'Windows'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 70254533809252557
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 2443401389210331043
            path: '/dev/physicaldrive1'
            phys_path: '#1048576#53676605440#\\?\PHYSICALDRIVE1'
            whole_disk: 1
            metaslab_array: 259
            metaslab_shift: 29
            ashift: 12
            asize: 53671886848
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 257
            com.delphix:vdev_zap_top: 258
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

For zdb properties like :

zdb -D tank2
zdb -h tank2
zdb -c tank2 

It has same output :

zdb: can't open 'tank2': No such device or address

@lundman
Copy link

lundman commented Apr 6, 2021

Sorry about the delay, the cstyle commit was a lot of work.

I don't have any issues with export and import itself:

$ ./zpool.exe create BOOM PHYSICALDRIVE0
working on dev '#1048576#85888860160#\\?\PHYSICALDRIVE0'
setting path here '/dev/physicaldrive0'
setting physpath here '#1048576#85888860160#\\?\PHYSICALDRIVE0'
Expanded path to '\\?\PHYSICALDRIVE0'

$ ./zpool.exe export -a
zunmount(BOOM,E:\ ) running
zunmount(BOOM,E:\ ) returns 0

$ ./zpool.exe import
path '\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000100#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive1'
read partitions ok 3
    gpt 0: type ce17d6f0 off 0x4400 len 0xffbc00
    gpt 1: type ce17d6f0 off 0x1000000 len 0x138800000
    gpt 2: type ce17d6f0 off 0x139800000 len 0x3c6600000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 22:    len 7fde:    tag: 10    name: 'Microsoft reserved partition'
    part 1:  offset 8000:    len 9c4000:    tag: 11    name: 'Basic data partition'
    part 2:  offset 9cc000:    len 1e33000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive0'
read partitions ok 4
    gpt 0: type ce17d6f0 off 0x100000 len 0x21100000
    gpt 1: type ce17d6f0 off 0x21200000 len 0x6300000
    gpt 2: type ce17d6f0 off 0x27500000 len 0x1000000
    gpt 3: type ce17d6f0 off 0x28500000 len 0x13d7900000
asking libefi to read label
EFI read OK, max partitions 9
    part 0:  offset 800:    len 9ffb000:    tag: 4    name: 'zfs-000074c200007eb3'
    part 8:  offset 9ffb800:    len 4000:    tag: b    name: ''
Processing volume '\\?\Volume{61b0b727-cc83-4c7d-8b10-7d476a617ca2}'
Processing volume '\\?\Volume{e97e5424-9cd1-43ab-8865-3adafc3951fe}'
Processing volume '\\?\Volume{f60bfa1f-6510-4b5b-8161-7d7aeb4aae4a}'
Processing volume '\\?\Volume{bb279b68-24b5-4db7-b137-cea80e83739c}'
Processing volume '\\?\Volume{5b884031-4fa1-46a6-92cf-994322d5ca7d}'
Processing volume '\\?\Volume{69ffd2f2-c511-11e9-bde6-806e6f6e6963}'
working on dev '#1048576#85888860160#\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
setting path here '/dev/physicaldrive0'
setting physpath here '#1048576#85888860160#\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
   pool: BOOM
     id: 4770041030314952367
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        BOOM              ONLINE
          physicaldrive0  ONLINE



$ ./zdb.exe -e BOOM
(random_fd = open(random_path, O_RDONLY)) != -1
ASSERT at ..\..\..\lib\libzpool\kernel.c:737:random_init()

But we are tripping over a silly assert for random there at the end.

@lundman
Copy link

lundman commented Apr 6, 2021

Fixed a couple of the asserts.. it fails to find the vdev in userland mode... probably needs a bit of patching

$ ./zdb.exe -e BOOM
path '\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000100#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive1'
read partitions ok 3
    gpt 0: type 4dd9e470 off 0x4400 len 0xffbc00
    gpt 1: type 4dd9e470 off 0x1000000 len 0x138800000
    gpt 2: type 4dd9e470 off 0x139800000 len 0x3c6600000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 22:    len 7fde:    tag: 10    name: 'Microsoft reserved partition'
    part 1:  offset 8000:    len 9c4000:    tag: 11    name: 'Basic data partition'
    part 2:  offset 9cc000:    len 1e33000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive0'
read partitions ok 4
    gpt 0: type 4dd9e470 off 0x100000 len 0x21100000
    gpt 1: type 4dd9e470 off 0x21200000 len 0x6300000
    gpt 2: type 4dd9e470 off 0x27500000 len 0x1000000
    gpt 3: type 4dd9e470 off 0x28500000 len 0x13d7900000
asking libefi to read label
EFI read OK, max partitions 9
    part 0:  offset 800:    len 9ffb000:    tag: 4    name: 'zfs-000074c200007eb3'
    part 8:  offset 9ffb800:    len 4000:    tag: b    name: ''
Processing volume '\\?\Volume{61b0b727-cc83-4c7d-8b10-7d476a617ca2}'
Processing volume '\\?\Volume{e97e5424-9cd1-43ab-8865-3adafc3951fe}'
Processing volume '\\?\Volume{f60bfa1f-6510-4b5b-8161-7d7aeb4aae4a}'
Processing volume '\\?\Volume{bb279b68-24b5-4db7-b137-cea80e83739c}'
Processing volume '\\?\Volume{5b884031-4fa1-46a6-92cf-994322d5ca7d}'
Processing volume '\\?\Volume{69ffd2f2-c511-11e9-bde6-806e6f6e6963}'
working on dev '#1048576#85888860160#\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
setting path here '/dev/physicaldrive0'
setting physpath here '#1048576#85888860160#\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
zdb: can't open 'BOOM': value too large

Configuration for import:
        vdev_children: 1
        version: 5000
        pool_guid: 4770041030314952367
        name: 'BOOM'
        state: 1
        hostid: 1486273368
        hostname: 'Windows'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 4770041030314952367
            children[0]:
                type: 'disk'
                id: 0
                guid: 7707493098211889732
                whole_disk: 1
                metaslab_array: 67
                metaslab_shift: 29
                ashift: 9
                asize: 85884141568
                is_log: 0
                create_txg: 4
                phys_path: '#1048576#85888860160#\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
                path: '/dev/physicaldrive0'
        load-policy:
            load-request-txg: 18446744073709551615
            load-rewind-policy: 2

ZFS_DBGMSG(zdb):
..\..\..\module\zfs\spa.c:5999:spa_import(): spa_import: importing BOOM
..\..\..\module\zfs\spa_misc.c:411:spa_load_note(): spa_load(BOOM, config trusted): LOADING
..\..\..\module\zfs\spa_misc.c:411:spa_load_note(): spa_load(BOOM, config untrusted): vdev tree has 1 missing top-level vdevs.
..\..\..\module\zfs\spa_misc.c:411:spa_load_note(): spa_load(BOOM, config untrusted): current settings allow for maximum 0 missing top-level vdevs at this stage.
..\..\..\module\zfs\spa_misc.c:396:spa_load_failed(): spa_load(BOOM, config untrusted): FAILED: unable to open vdev tree [error=132]
..\..\..\module\zfs\vdev.c:183:vdev_dbgmsg_print_tree():   vdev 0: root, guid: 4770041030314952367, path: N/A, can't open
..\..\..\module\zfs\vdev.c:183:vdev_dbgmsg_print_tree():     vdev 0: disk, guid: 7707493098211889732, path: /dev/physicaldrive0, can't open
..\..\..\module\zfs\spa_misc.c:411:spa_load_note(): spa_load(BOOM, config untrusted): UNLOADING

@datacore-skumar
Copy link
Author

It seems you are using WSL2, i tried zpool export then import over physical and virtual machine (Windows) and it never worked for me. Eventually i tried same over a WSL2 machine, and for me it also worked with same binaries.
If you look in status section, mismatch is happening, can it be a issue for failing in windows and somehow working fine in WSL2?

raj@DESKTOP-H5NDHD2:/mnt/c/bin$ ./zpool.exe status
  pool: tank
 state: ONLINE
status: Mismatch between pool hostid and system hostid on imported pool.
        This pool was previously imported into a system with a different hostid,
        and then was verbatim imported into this system_.
action: Export this pool on all systems on which it is imported.
        Then import it to correct the mismatch.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config: 

        NAME              STATE     READ WRITE CKSUM
        tank              ONLINE       0     0     0
          physicaldrive1  ONLINE       0     0     0

errors: No known data errors 

If you are using WSL2, you may try same commands outside WSL2.

@lundman
Copy link

lundman commented Apr 8, 2021

I have WSL2 installed but I'm not using it. Suppose that could be changing things, I am also in "git for windows bash" - which is built on MinGW - which could also be changing things. I could try from a vanilla CMD shell

@imtiazdc
Copy link

imtiazdc commented Apr 9, 2021

@datacore-skumar Could you attach the diff of zdb changes? It may help @lundman use/approve/review those changes and help us root cause the issues faster.

@datacore-skumar
Copy link
Author

Attached file is the diff of zdb changes.
OpenZFS2_git_diff_zdb.txt

@imtiazdc
Copy link

Hi @lundman,

Were you able to repro the issue on your end that @datacore-skumar is running into?

@lundman
Copy link

lundman commented Apr 12, 2021

I can export pool just fine in CMD as well - unfortunately. I do not use the pool much, just create and export, to have something to test zdb against.

I noticed the work I did for zdb in here, is part of the OpenZFS2_git_diff_zdb.txt - and there are a few more things in there too, so really it would be better if OpenZFS2_git_diff_zdb.txt was made into a PR against this repo, so we can discuss it.

For example, all the changes to lib/libzpool/kernel.c. Be interesting to see if it would get us most of the way by just including wosix.h - failing that, we might need to split out the file IO to separate file, so we can have a _windows.c version of it. I can bring that up with the others if that is the way to go.

But otherwise, all the other changes are right on the money, exactly how I would/did change them.

@arun-kv
Copy link

arun-kv commented Apr 19, 2021

Hi @lundman, I'm also not able to import a pool using 'zpool import' command. I'm using the latest source code.
Attached the screenshots and messages.

image

image

image

Z:>zpool.exe create pool-1 PHYSICALDRIVE2 PHYSICALDRIVE1
Expanded path to '\?\PHYSICALDRIVE2'
Expanded path to '\?\PHYSICALDRIVE1'
working on dev '#1048576#21464350720#\?\PHYSICALDRIVE2'
setting path here '/dev/physicaldrive2'
setting physpath here '#1048576#21464350720#\?\PHYSICALDRIVE2'
working on dev '#1048576#21464350720#\?\PHYSICALDRIVE1'
setting path here '/dev/physicaldrive1'
setting physpath here '#1048576#21464350720#\?\PHYSICALDRIVE1'

Z:>zpool.exe status
pool: pool-1
state: ONLINE
config:

    NAME              STATE     READ WRITE CKSUM
    pool-1            ONLINE       0     0     0
      physicaldrive2  ONLINE       0     0     0
      physicaldrive1  ONLINE       0     0     0

errors: No known data errors

Z:>zpool.exe export pool-1
zunmount(pool-1,D:\ ) running
zunmount(pool-1,D:\ ) returns 0

Z:>zpool.exe status
no pools available

Z:>zpool.exe import
path '\?\ide#diskvbox_harddisk___________________________1.0_____#5&394c0ad3&0&0.0.0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
and '\?\PhysicalDrive0'
read partitions ok 4
mbr 0: type 7 off 0x100000 len 0x22500000
mbr 1: type 7 off 0x22600000 len 0xa09400000
mbr 2: type 0 off 0x0 len 0x0
mbr 3: type 0 off 0x0 len 0x0
asking libefi to read label
path '\?\ide#diskvbox_harddisk___________________________1.0_____#5&106af171&0&1.1.0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
and '\?\PhysicalDrive3'
read partitions ok 0
asking libefi to read label
EFI read OK, max partitions 9
part 0: offset 800: len 27fb000: tag: 4 name: 'zfs-000061b00000234f'
part 8: offset 27fb800: len 4000: tag: b name: ''
path '\?\ide#diskvbox_harddisk___________________________1.0_____#5&394c0ad3&0&0.1.0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
and '\?\PhysicalDrive1'
read partitions ok 0
asking libefi to read label
EFI read OK, max partitions 9
part 0: offset 800: len 27fb000: tag: 4 name: 'zfs-00005c6c000033f1'
part 8: offset 27fb800: len 4000: tag: b name: ''
path '\?\ide#diskvbox_harddisk___________________________1.0_____#5&106af171&0&1.0.0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
and '\?\PhysicalDrive2'
read partitions ok 0
asking libefi to read label
EFI read OK, max partitions 9
part 0: offset 800: len 27fb000: tag: 4 name: 'zfs-000027c5000004c5'
part 8: offset 27fb800: len 4000: tag: b name: ''
Processing volume '\?\Volume{145b01a0-0000-0000-0000-100000000000}'
Processing volume '\?\Volume{145b01a0-0000-0000-0000-602200000000}'
working on dev '#1048576#21464350720#\?\ide#diskvbox_harddisk___________________________1.0_____#5&106af171&0&1.1.0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
setting path here '/dev/physicaldrive3'
setting physpath here '#1048576#21464350720#\?\ide#diskvbox_harddisk___________________________1.0_____#5&106af171&0&1.1.0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
working on dev '#1048576#21464350720#\?\ide#diskvbox_harddisk___________________________1.0_____#5&106af171&0&1.0.0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
setting path here '/dev/physicaldrive2'
setting physpath here '#1048576#21464350720#\?\ide#diskvbox_harddisk___________________________1.0_____#5&106af171&0&1.0.0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
working on dev '#1048576#21464350720#\?\ide#diskvbox_harddisk___________________________1.0_____#5&394c0ad3&0&0.1.0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
setting path here '/dev/physicaldrive1'
setting physpath here '#1048576#21464350720#\?\ide#diskvbox_harddisk___________________________1.0_____#5&394c0ad3&0&0.1.0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
pool: arun
id: 8348721417575318707
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:

    arun              FAULTED  corrupted data
      physicaldrive3  FAULTED  corrupted data

pool: pool-1
id: 8423108592317542458
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:

    **pool-1            FAULTED  corrupted data
      physicaldrive2  UNAVAIL  corrupted data
      physicaldrive1  UNAVAIL  corrupted data**

Z:>

@lundman
Copy link

lundman commented Apr 19, 2021

Thank you, that is very interesting. It knows there is a pool, but fails to find the vdev. I will try to replicate this here. Do you know if it happens especially for mirrors?

@imtiazdc
Copy link

@lundman We are seeing failures with zpool import even when there is only one vdev in the zpool.

@imtiazdc
Copy link

Hi @lundman,

Please let us know if you need any specific / further info to help us narrow this down. Happy to collect and share. Hopefully, once we get past this issue, we will be able to get the zdb to work and resubmit the PR for your review.

@lundman
Copy link

lundman commented Apr 21, 2021

Having a hard time to replicate the issue, could it be device names?

I created two VHDs, to see if that will have issues:

$  ./out/build/x64-Debug/cmd/zpool/zpool.exe create BOOM mirror PHYSICALDRIVE2 PHYSICALDRIVE3
working on dev '#1048576#1063256064#\\?\PHYSICALDRIVE2'
setting path here '/dev/physicaldrive2'
setting physpath here '#1048576#1063256064#\\?\PHYSICALDRIVE2'
working on dev '#1048576#1063256064#\\?\PHYSICALDRIVE3'
setting path here '/dev/physicaldrive3'
setting physpath here '#1048576#1063256064#\\?\PHYSICALDRIVE3'
Expanded path to '\\?\PHYSICALDRIVE2'
Expanded path to '\\?\PHYSICALDRIVE3'

$  ./out/build/x64-Debug/cmd/zpool/zpool.exe status
  pool: BOOM
 state: ONLINE
status: Mismatch between pool hostid and system hostid on imported pool.
        This pool was previously imported into a system with a different hostid,
        and then was verbatim imported into this system.
action: Export this pool on all systems on which it is imported.
        Then import it to correct the mismatch.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:

        NAME                STATE     READ WRITE CKSUM
        BOOM                ONLINE       0     0     0
          mirror-0          ONLINE       0     0     0
            physicaldrive2  ONLINE       0     0     0
            physicaldrive3  ONLINE       0     0     0

errors: No known data errors

$  ./out/build/x64-Debug/cmd/zpool/zpool.exe export BOOM
zunmount(BOOM,E:\ ) running
zunmount(BOOM,E:\ ) returns 0


 .$  ./out/build/x64-Debug/cmd/zpool/zpool.exe import
path '\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000100#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive1'
read partitions ok 3
    gpt 0: type 976fd9e0 off 0x4400 len 0xffbc00
    gpt 1: type 976fd9e0 off 0x1000000 len 0x138800000
    gpt 2: type 976fd9e0 off 0x139800000 len 0x3c6600000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 22:    len 7fde:    tag: 10    name: 'Microsoft reserved partition'
    part 1:  offset 8000:    len 9c4000:    tag: 11    name: 'Basic data partition'
    part 2:  offset 9cc000:    len 1e33000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_vmware_&prod_vmware_virtual_s#5&1ec51bf7&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive0'
read partitions ok 4
    gpt 0: type 976fd9e0 off 0x100000 len 0x21100000
    gpt 1: type 976fd9e0 off 0x21200000 len 0x6300000
    gpt 2: type 976fd9e0 off 0x27500000 len 0x1000000
    gpt 3: type 976fd9e0 off 0x28500000 len 0x13d7900000
asking libefi to read label
EFI read OK, max partitions 128
    part 1:  offset 109000:    len 31800:    tag: c    name: 'EFI system partition'
    part 2:  offset 13a800:    len 8000:    tag: 10    name: 'Microsoft reserved partition'
    part 3:  offset 142800:    len 9ebc800:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_msft&prod_virtual_disk#2&1f4adffe&0&000001#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive2'
read partitions ok 0
asking libefi to read label
EFI read OK, max partitions 9
    part 0:  offset 800:    len 1fb000:    tag: 4    name: 'zfs-00002eff00001ff3'
    part 8:  offset 1fb800:    len 4000:    tag: b    name: ''
path '\\?\scsi#disk&ven_msft&prod_virtual_disk#2&1f4adffe&0&000002#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive3'
read partitions ok 0
asking libefi to read label
EFI read OK, max partitions 9
    part 0:  offset 800:    len 1fb000:    tag: 4    name: 'zfs-0000246d00000aee'
    part 8:  offset 1fb800:    len 4000:    tag: b    name: ''
Processing volume '\\?\Volume{61b0b727-cc83-4c7d-8b10-7d476a617ca2}'
Processing volume '\\?\Volume{e97e5424-9cd1-43ab-8865-3adafc3951fe}'
Processing volume '\\?\Volume{f60bfa1f-6510-4b5b-8161-7d7aeb4aae4a}'
Processing volume '\\?\Volume{bb279b68-24b5-4db7-b137-cea80e83739c}'
Processing volume '\\?\Volume{5b884031-4fa1-46a6-92cf-994322d5ca7d}'
Processing volume '\\?\Volume{69ffd2f2-c511-11e9-bde6-806e6f6e6963}'
working on dev '#1048576#1063256064#\\?\scsi#disk&ven_msft&prod_virtual_disk#2&1f4adffe&0&000001#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
setting path here '/dev/physicaldrive2'
setting physpath here '#1048576#1063256064#\\?\scsi#disk&ven_msft&prod_virtual_disk#2&1f4adffe&0&000001#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
working on dev '#1048576#1063256064#\\?\scsi#disk&ven_msft&prod_virtual_disk#2&1f4adffe&0&000002#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
setting path here '/dev/physicaldrive3'
setting physpath here '#1048576#1063256064#\\?\scsi#disk&ven_msft&prod_virtual_disk#2&1f4adffe&0&000002#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
   pool: BOOM
     id: 3634247400435395891
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        BOOM                ONLINE
          mirror-0          ONLINE
            physicaldrive2  ONLINE
            physicaldrive3  ONLINE

So no issues there.

@imtiazdc
Copy link

Interesting. Thanks for trying @lundman. Could you upload a zip file with all your binaries and driver? We will see if we can repro the issue on our machines with your binaries.

@lundman
Copy link

lundman commented Apr 23, 2021

https://www.lundman.net/OpenZFSOnWindows-debug-2.0.0-31-g3ee1b5689-dirty.exe

Not signed, so you need TestMode on.

@imtiazdc
Copy link

Thanks @lundman. We were able to repro the issues with your binaries too. Now that the zdb is fixed, we will focus on investigating the zpool import issue.

@datacore-rm
Copy link

Fixed in commit log:
2eb920d

lundman pushed a commit that referenced this issue Mar 3, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
lundman pushed a commit that referenced this issue Mar 3, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants