Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Zvol trim #16

Closed
wants to merge 93 commits into from
Closed

WIP: Zvol trim #16

wants to merge 93 commits into from

Conversation

lundman
Copy link

@lundman lundman commented Apr 14, 2021

Handle incoming SCSIOP_UNMAP and place it into the same WrkRtn as IO, to queue and handle cancel. Then fan out to call ScsiOpUnmap_impl() instead of IO calls.

Parse the struct with the unmap ranges, calling zvol_os_unmap() for each.

Find a way to test it, as ntfs didn't "just enable" it when I tested in VM :)

cstyle still needs to be applied
Unknown how other platforms get there
Not using vfork() and execve()
Amusingly, usleep() is more portable than poll() as a timing function.
These should probably rely on HAVE_UIO_H etc
Should probably rely on HAVE_UIO_H
lundman and others added 23 commits February 12, 2021 15:13
We'll just leak for now.

Also clear out the fileobjects
As we set S_IFBLK in stat_blk() so we can detect if something is a disk
or file, and Windows stat.h S_IFMT isn't wide enough to contain it.

We now use internal stat.h values only, and we will have to convert
any calls to Windows _stat() into our values. However, we do not
use _stat() with posix.c, but rather simulate them anyway.
But uses stat.st_mode to test for BLK or REG.

Using the example of zfs_resolve_shortname("Harddisk1Partition2");

***************

	if (arg[0] == '/') {

	} else {
		err = is_shorthand_path(arg, path, sizeof (path),
		    &statbuf, &wholedisk);
			}
	}
    }

    /*
     * Determine whether this is a device or a file.
     */
    if (wholedisk || S_ISBLK(statbuf.st_mode)) {

*****************

Possibly the idea is that is_shorthand_path() is expected to call stat(), as it is
passed the struct.

The path will expand into "\\?\Harddisk1Partition2" in zfs_resolve_shortname() which can be opened, and
stat()ed.  However;

error = zfs_resolve_shortname(arg, path, path_size);
if (error == 0) {
	*wholedisk = zfs_dev_is_whole_disk(path);
	if (*wholedisk || (stat64(path, statbuf) == 0))

It correctly determins that wholedisk should be FALSE (it is just a
partition) but because of that, the short-circuit "||" will not call
stat64(), and the statbuf is used uninitialised in the S_ISBLK() test.

At least on Windows, we always want to call stat64(), after the
zfs_resolve_shortname() comes back (and it should be available at
this point).
Take partition names, in the format "HarddiskXPartitionY".

$  ./zpool.exe create DOOM Harddisk1Partition3
Expanded path to '\\?\Harddisk1Partition3'

$  ./zpool.exe list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
DOOM    15G  99.5K  15.0G        -         -     0%     0%  1.00x    ONLINE  -

$  ./zpool.exe status
  pool: DOOM
   state: ONLINE
   status: Mismatch between pool hostid and system hostid on imported pool.
           This pool was previously imported into a system with a different hostid,
           and then was verbatim imported into this system.
	   action: Export this pool on all systems on which it is imported.
           Then import it to correct the mismatch.
      see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
   config:
         NAME                   STATE     READ WRITE CKSUM
         DOOM                   ONLINE       0     0     0
           Harddisk1Partition3  ONLINE       0     0     0

Also, only Offline wholedisks - or we would offline the disk containing
our partition.
Unsure why it doesn't work as advertised. Have to hardcode path
to correct library for now. FindOpenSSL picks MD versions and not
MT. Path ends up all wrong with "optimizedC:" strings and so on.
This is a large commit, but it should have no functional changes, just cstyle corrections.


* fix for cstyle.

Had to guess some things. Move ident in made some places nicer.

* Correct typos in cstyle

Thanks for the mindless work

Co-authored-by: Kajerik Lundman <kajerik@lundman.net>
How long as the test for vp==NULL VN_RELE been there, and how has it not
died sooner.
Random has no need for fd in Windows.

It was tempting to change pthread.h to return 0 instead of ENOTSUP, but
it is more correct to signal we don't support it.
Since zdb will open "disks" with vdev_file in userland, it needs to
understand the partition hack.
So we need to add some more API calls to handle it.

There is still change needed in abd.c

It might in fact be easier to not call aligned_malloc, and handle
alignment internally.
@@ -343,6 +343,11 @@ ScsiExecuteMain(
status = ScsiOpReportLuns(pHBAExt, pSrb);
break;

case SCSIOP_UNMAP:
DbgBreakPoint();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obvs remove before merge

@lundman
Copy link
Author

lundman commented Jul 2, 2021

PR messed up, merged

@lundman lundman closed this Jul 2, 2021
lundman pushed a commit that referenced this pull request Mar 3, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
lundman pushed a commit that referenced this pull request Mar 3, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
lundman pushed a commit that referenced this pull request Sep 4, 2024
If a zvol has more than 15 partitions, the minor device number exhausts
the slot count reserved for partitions next to the zvol itself. As a
result, the minor number cannot be used to determine the partition
number for the higher partition, and doing so results in wrong named
symlinks being generated by udev.

Since the partition number is encoded in the block device name anyway,
let's just extract it from there instead.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes openzfs#15904
Closes openzfs#15970
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant