Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xattr compat cleanup #16

Merged
9 commits merged into from
Jul 15, 2021
Merged

xattr compat cleanup #16

9 commits merged into from
Jul 15, 2021

Conversation

ghost
Copy link

@ghost ghost commented Jul 13, 2021

Address feedback on xattr compat implementation.

Ryan Moeller added 4 commits July 13, 2021 10:40
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
We don't need to check if the xattr exists before attempting to delete
it.  This was a carryover from an implementation detail on Linux.

With this simplified, we can also reuse zfs_deleteextattr_impl in
zfs_setextattr_impl.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
We want to be able to configure these checks to be skipped, ideally on a
per-dataset basis.  I haven't decided how to implement this.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
@ghost ghost added the WIP Work In Progress label Jul 13, 2021
@ghost ghost requested a review from amotin July 13, 2021 15:33
Ryan Moeller added 2 commits July 13, 2021 12:34
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
This avoids activating feature@xattr_compat by default in TrueNAS while
the feature is still waiting to be upstreamed.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
@ghost ghost removed the WIP Work In Progress label Jul 14, 2021
@ghost
Copy link
Author

ghost commented Jul 14, 2021

The last commit adds an xattr_fallback property that can be turned off to disable fallback accesses for performance. It defaults on for safety at the moment, but we can explore ways of implementing an "auto" mode that automatically avoids the fallbacks when we can know for certain it is OK to do so.

@ghost ghost force-pushed the xattr-compat-cleanup branch from 67f3aaf to 79c095c Compare July 14, 2021 20:33
@ghost ghost added the WIP Work In Progress label Jul 14, 2021
@ghost
Copy link
Author

ghost commented Jul 14, 2021

Investigating the test failures...

@ghost ghost force-pushed the xattr-compat-cleanup branch from 79c095c to 8ec06fe Compare July 14, 2021 21:36
@ghost ghost removed the WIP Work In Progress label Jul 14, 2021
This property defaults to "on" so by default we do not miss xattrs in
datasets received from pools for which the encoding cannot be known or
may even be a mix of both.  It can be turned "off" for a performance
boost when it is known only the configured xattr_compat format is
relevant.

Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
@ghost ghost force-pushed the xattr-compat-cleanup branch from 8ec06fe to 133096f Compare July 14, 2021 21:49
@ghost
Copy link
Author

ghost commented Jul 14, 2021

  • Fixed xattr removal on Linux

@ghost ghost force-pushed the xattr-compat-cleanup branch from a28919f to d657724 Compare July 15, 2021 15:04
Ryan Moeller added 2 commits July 15, 2021 11:54
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Moeller <ryan@iXsystems.com>
@ghost ghost force-pushed the xattr-compat-cleanup branch from d657724 to 40b0ef2 Compare July 15, 2021 15:54
@ghost
Copy link
Author

ghost commented Jul 15, 2021

At this point I'm just stubbornly trying to get the ABI checks to pass.

@ghost ghost requested a review from amotin July 15, 2021 15:58
Copy link
Collaborator

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with some comments.

on both FreeBSD and Linux.
on both
.Fx
and Linux.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks not upstreamable. Why do we need it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already has been done upstream in f84fe3f

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how we lost this change here, because I did merge that commit as c650ceb.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just thinking .Fx may be not present on other platforms. But if it is, I am sure OK about about it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because the man page was moved and we had cherry-picked the xattr=sa before the merge, I probably resolved the conflict incorrectly.

man/man8/zfs-send.8 Show resolved Hide resolved
error = zfs_getextattr_impl(ap, compat);
if (error == ENOENT && ap->a_attrnamespace == EXTATTR_NAMESPACE_USER)
if (fallback && error == ENOENT &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much sense in the fallback variable, but as you like.

.Sy xattr_compat ,
when set to
.Sy on
(the default), a second attempt will be made using the other xattr name format.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While adding the property looks good to me, haven't you thought about some way to set it automatically only when needed? For example, when xattr_compat is changed. Again I'd like to see some long term but not permanent migration plan.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking about it.

I know for sure there are cases where we can't know the optimal thing to do, such as when we've received a dataset without the property or when we've imported a pool without the property. There's no way I can think of to determine a priori which way the xattrs are named in these cases, and in fact there could be both formats present in the same dataset. Being able to read both by default ensures the user does not unwittingly lose their xattrs. xattr_fallback won't cause the xattr_compat feature to become active.

I'm not very familiar with the DSL part of ZFS yet, which is where I think I'll need to be working to figure out if we can safely disable fallback automatically, or rather, to be able to know when we need to force xattr_fallback=on so we can change the default to xattr_fallback=off. I'm still getting up to speed in this area.

@ghost ghost merged commit 25f4e4a into truenas/zfs-2.1-release Jul 15, 2021
@ghost ghost deleted the xattr-compat-cleanup branch July 15, 2021 19:57
ixhamza pushed a commit that referenced this pull request Mar 20, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
ixhamza pushed a commit that referenced this pull request Jun 12, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes openzfs#14501
ixhamza pushed a commit that referenced this pull request Apr 1, 2024
If a zvol has more than 15 partitions, the minor device number exhausts
the slot count reserved for partitions next to the zvol itself. As a
result, the minor number cannot be used to determine the partition
number for the higher partition, and doing so results in wrong named
symlinks being generated by udev.

Since the partition number is encoded in the block device name anyway,
let's just extract it from there instead.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes openzfs#15904
Closes openzfs#15970
ixhamza pushed a commit that referenced this pull request May 6, 2024
If a zvol has more than 15 partitions, the minor device number exhausts
the slot count reserved for partitions next to the zvol itself. As a
result, the minor number cannot be used to determine the partition
number for the higher partition, and doing so results in wrong named
symlinks being generated by udev.

Since the partition number is encoded in the block device name anyway,
let's just extract it from there instead.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Closes openzfs#15904
Closes openzfs#15970
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant