-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix panics when truncating/deleting files #15983
Conversation
I don't suppose you have a nice reproducer for this? |
6225188
to
aac2867
Compare
Would you be OK with downloading a tarball with the container? I don't have a better idea how to reproduce it. It was pretty easy to hit with a fresh copy (I copied it onto ext4 and verified it works, tossed away the old pool and created a new one a few times while always rsyncing it back from ext4 onto ZFS) |
aac2867
to
d7fdcce
Compare
The intent was "do you have a test case to shove in the test suite", really, so a whole container bootup for a test case is not ideal. |
If you can describe the sequencing that trips it, we might be able to reduce it to a test program (I've done this a couple of times for other bugs). Otherwise I can try the container reproducer and try to work it out from there. Not that I doubt the fix, sounds legit (haven't read yet, I will soon). But if we can prove it, even better! |
The LXC container can be found here, it's a xz'd tarball, packed 7.7G, unpacked ~35G. You can unpack it with:
There are two main things in the tarball, The container is with Root password: a To start the container once it's in place, as root (using raw LXC, not LXD/incus) |
d7fdcce
to
4aa9bae
Compare
There's an union in dbuf_dirty_record_t; dr_brtwrite could evaluate to B_TRUE if the dirty record is of another type than dl. Adding more explicit dr type check before trying to access dr_brtwrite. Fixes two similar panics: [ 1373.806119] VERIFY0(db->db_level) failed (0 == 1) [ 1373.807232] PANIC at dbuf.c:2549:dbuf_undirty() [ 1373.814979] dump_stack_lvl+0x71/0x90 [ 1373.815799] spl_panic+0xd3/0x100 [spl] [ 1373.827709] dbuf_undirty+0x62a/0x970 [zfs] [ 1373.829204] dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs] [ 1373.831010] dnode_free_range+0x532/0x1220 [zfs] [ 1373.833922] dmu_free_long_range+0x4e0/0x930 [zfs] [ 1373.835277] zfs_trunc+0x75/0x1e0 [zfs] [ 1373.837958] zfs_freesp+0x9b/0x470 [zfs] [ 1373.847236] zfs_setattr+0x161a/0x3500 [zfs] [ 1373.855267] zpl_setattr+0x125/0x320 [zfs] [ 1373.856725] notify_change+0x1ee/0x4a0 [ 1373.859207] do_truncate+0x7f/0xd0 [ 1373.859968] do_sys_ftruncate+0x28e/0x2e0 [ 1373.860962] do_syscall_64+0x38/0x90 [ 1373.861751] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 1822.381337] VERIFY0(db->db_level) failed (0 == 1) [ 1822.382376] PANIC at dbuf.c:2549:dbuf_undirty() [ 1822.389232] dump_stack_lvl+0x71/0x90 [ 1822.389920] spl_panic+0xd3/0x100 [spl] [ 1822.399567] dbuf_undirty+0x62a/0x970 [zfs] [ 1822.400583] dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs] [ 1822.401752] dnode_free_range+0x532/0x1220 [zfs] [ 1822.402841] dmu_object_free+0x74/0x120 [zfs] [ 1822.403869] zfs_znode_delete+0x75/0x120 [zfs] [ 1822.404906] zfs_rmnode+0x3f6/0x7f0 [zfs] [ 1822.405870] zfs_inactive+0xa3/0x610 [zfs] [ 1822.407803] zpl_evict_inode+0x3e/0x90 [zfs] [ 1822.408831] evict+0xc1/0x1c0 [ 1822.409387] do_unlinkat+0x147/0x300 [ 1822.410060] __x64_sys_unlinkat+0x33/0x60 [ 1822.410802] do_syscall_64+0x38/0x90 [ 1822.411458] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
4aa9bae
to
cd7aea4
Compare
Possible candidate for zfs-2.2.4-staging? |
There's an union in dbuf_dirty_record_t; dr_brtwrite could evaluate to B_TRUE if the dirty record is of another type than dl. Adding more explicit dr type check before trying to access dr_brtwrite. Fixes two similar panics: [ 1373.806119] VERIFY0(db->db_level) failed (0 == 1) [ 1373.807232] PANIC at dbuf.c:2549:dbuf_undirty() [ 1373.814979] dump_stack_lvl+0x71/0x90 [ 1373.815799] spl_panic+0xd3/0x100 [spl] [ 1373.827709] dbuf_undirty+0x62a/0x970 [zfs] [ 1373.829204] dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs] [ 1373.831010] dnode_free_range+0x532/0x1220 [zfs] [ 1373.833922] dmu_free_long_range+0x4e0/0x930 [zfs] [ 1373.835277] zfs_trunc+0x75/0x1e0 [zfs] [ 1373.837958] zfs_freesp+0x9b/0x470 [zfs] [ 1373.847236] zfs_setattr+0x161a/0x3500 [zfs] [ 1373.855267] zpl_setattr+0x125/0x320 [zfs] [ 1373.856725] notify_change+0x1ee/0x4a0 [ 1373.859207] do_truncate+0x7f/0xd0 [ 1373.859968] do_sys_ftruncate+0x28e/0x2e0 [ 1373.860962] do_syscall_64+0x38/0x90 [ 1373.861751] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 1822.381337] VERIFY0(db->db_level) failed (0 == 1) [ 1822.382376] PANIC at dbuf.c:2549:dbuf_undirty() [ 1822.389232] dump_stack_lvl+0x71/0x90 [ 1822.389920] spl_panic+0xd3/0x100 [spl] [ 1822.399567] dbuf_undirty+0x62a/0x970 [zfs] [ 1822.400583] dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs] [ 1822.401752] dnode_free_range+0x532/0x1220 [zfs] [ 1822.402841] dmu_object_free+0x74/0x120 [zfs] [ 1822.403869] zfs_znode_delete+0x75/0x120 [zfs] [ 1822.404906] zfs_rmnode+0x3f6/0x7f0 [zfs] [ 1822.405870] zfs_inactive+0xa3/0x610 [zfs] [ 1822.407803] zpl_evict_inode+0x3e/0x90 [zfs] [ 1822.408831] evict+0xc1/0x1c0 [ 1822.409387] do_unlinkat+0x147/0x300 [ 1822.410060] __x64_sys_unlinkat+0x33/0x60 [ 1822.410802] do_syscall_64+0x38/0x90 [ 1822.411458] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes #15983
There's an union in dbuf_dirty_record_t; dr_brtwrite could evaluate to B_TRUE if the dirty record is of another type than dl. Adding more explicit dr type check before trying to access dr_brtwrite. Fixes two similar panics: [ 1373.806119] VERIFY0(db->db_level) failed (0 == 1) [ 1373.807232] PANIC at dbuf.c:2549:dbuf_undirty() [ 1373.814979] dump_stack_lvl+0x71/0x90 [ 1373.815799] spl_panic+0xd3/0x100 [spl] [ 1373.827709] dbuf_undirty+0x62a/0x970 [zfs] [ 1373.829204] dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs] [ 1373.831010] dnode_free_range+0x532/0x1220 [zfs] [ 1373.833922] dmu_free_long_range+0x4e0/0x930 [zfs] [ 1373.835277] zfs_trunc+0x75/0x1e0 [zfs] [ 1373.837958] zfs_freesp+0x9b/0x470 [zfs] [ 1373.847236] zfs_setattr+0x161a/0x3500 [zfs] [ 1373.855267] zpl_setattr+0x125/0x320 [zfs] [ 1373.856725] notify_change+0x1ee/0x4a0 [ 1373.859207] do_truncate+0x7f/0xd0 [ 1373.859968] do_sys_ftruncate+0x28e/0x2e0 [ 1373.860962] do_syscall_64+0x38/0x90 [ 1373.861751] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 1822.381337] VERIFY0(db->db_level) failed (0 == 1) [ 1822.382376] PANIC at dbuf.c:2549:dbuf_undirty() [ 1822.389232] dump_stack_lvl+0x71/0x90 [ 1822.389920] spl_panic+0xd3/0x100 [spl] [ 1822.399567] dbuf_undirty+0x62a/0x970 [zfs] [ 1822.400583] dmu_buf_will_dirty_impl+0x1e9/0x5b0 [zfs] [ 1822.401752] dnode_free_range+0x532/0x1220 [zfs] [ 1822.402841] dmu_object_free+0x74/0x120 [zfs] [ 1822.403869] zfs_znode_delete+0x75/0x120 [zfs] [ 1822.404906] zfs_rmnode+0x3f6/0x7f0 [zfs] [ 1822.405870] zfs_inactive+0xa3/0x610 [zfs] [ 1822.407803] zpl_evict_inode+0x3e/0x90 [zfs] [ 1822.408831] evict+0xc1/0x1c0 [ 1822.409387] do_unlinkat+0x147/0x300 [ 1822.410060] __x64_sys_unlinkat+0x33/0x60 [ 1822.410802] do_syscall_64+0x38/0x90 [ 1822.411458] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Pavel Snajdr <snajpa@snajpa.net> Closes openzfs#15983
Motivation and Context
Fixes two similar panics:
Description
There's an union in
dbuf_dirty_record_t
;dr_brtwrite
could evaluate toB_TRUE
if the dirty record is of another type thandl
. Adding more explicitdr
type check before trying to accessdr_brtwrite
.How Has This Been Tested?
The panics no longer reproduce. It was easily reproducible when I rsync'd heavier LXC container (gitlab, qemu, other weird stuff in) onto a new pool.
feature@block_cloning=(disabled|enabled)
doesn't play a role.Types of changes
Checklist:
Signed-off-by
.