Reduce Stack Usage #15

behlendorf · 2010-05-26T18:26:30Z

The following stack usage is the current worst case stack usage I observed on an i686 FC12 box. 32-bit systems are going to be out worst case because they can be configured with 4k stacks and I'd like to fit in that if at all possible. The worst case offender here is zpios_ioctl_cmd() which is easy low hanging fruit but we should look at the other big functions and see what can be done.

        Depth    Size   Location    (78 entries)
        -----    ----   --------
  0)     4484      20   jiffies_to_timeval+0x23/0x3a
  1)     4464      48   acct_update_integrals+0x48/0xd3
  2)     4416      40   account_system_time+0x12d/0x135
  3)     4376      16   account_process_tick+0x5d/0x6f
  4)     4360      20   update_process_times+0x24/0x48
  5)     4340      24   tick_sched_timer+0x72/0xa0
  6)     4316      32   __run_hrtimer+0xa0/0xef
  7)     4284      60   hrtimer_interrupt+0xe0/0x1d5
  8)     4224       8   timer_interrupt+0x3b/0x42
  9)     4216      32   handle_IRQ_event+0x57/0xfd
 10)     4184      20   handle_level_irq+0x5e/0xb1
 11)     4164      20   handle_irq+0x40/0x4d
 12)     4144      24   do_IRQ+0x46/0x9a
 13)     4120      68   common_interrupt+0x30/0x38
 14)     4052      20   __queue_work+0x2f/0x34
 15)     4032       8   delayed_work_timer_fn+0x32/0x34
 16)     4024      60   run_timer_softirq+0x16d/0x1f0
 17)     3964      44   __do_softirq+0xb1/0x157
 18)     3920      12   do_softirq+0x36/0x41
 19)     3908       8   irq_exit+0x2e/0x61
 20)     3900      24   do_IRQ+0x86/0x9a
 21)     3876     104   common_interrupt+0x30/0x38
 22)     3772      12   __blk_run_queue+0x36/0x60
 23)     3760      68   cfq_insert_request+0x347/0x4e2
 24)     3692      28   elv_insert+0x11d/0x1b3
 25)     3664      28   __elv_add_request+0x92/0x99
 26)     3636      60   __make_request+0x312/0x386
 27)     3576      92   generic_make_request+0x21e/0x268
 28)     3484      60   submit_bio+0xc7/0xe0
 29)     3424      12   mpage_bio_submit+0x21/0x26
 30)     3412     176   do_mpage_readpage+0x527/0x58a
 31)     3236     128   mpage_readpages+0x88/0xbe
 32)     3108      12   ext3_readpages+0x19/0x1b
 33)     3096      56   __do_page_cache_readahead+0x102/0x160
 34)     3040      20   ra_submit+0x1c/0x21
 35)     3020      40   ondemand_readahead+0x181/0x18d
 36)     2980      28   page_cache_sync_readahead+0x38/0x42
 37)     2952     116   generic_file_aio_read+0x200/0x4e4
 38)     2836     168   do_sync_read+0xae/0xe9
 39)     2668      28   vfs_read+0x82/0xe1
 40)     2640      32   vn_rdwr+0x62/0x9e [spl]
 41)     2608      56   vdev_file_io_start+0xb1/0xe1 [zfs]
 42)     2552      44   zio_vdev_io_start+0x207/0x20f [zfs]
 43)     2508      32   zio_execute+0x91/0xaf [zfs]
 44)     2476      12   zio_nowait+0x31/0x34 [zfs]
 45)     2464     116   vdev_cache_read+0x2be/0x2d9 [zfs]
 46)     2348      44   zio_vdev_io_start+0x1b9/0x20f [zfs]
 47)     2304      32   zio_execute+0x91/0xaf [zfs]
 48)     2272      12   zio_nowait+0x31/0x34 [zfs]
 49)     2260     176   vdev_raidz_io_start+0x828/0x848 [zfs]
 50)     2084      44   zio_vdev_io_start+0x207/0x20f [zfs]
 51)     2040      32   zio_execute+0x91/0xaf [zfs]
 52)     2008      12   zio_nowait+0x31/0x34 [zfs]
 53)     1996      56   vdev_mirror_io_start+0x33e/0x358 [zfs]
 54)     1940      44   zio_vdev_io_start+0x42/0x20f [zfs]
 55)     1896      32   zio_execute+0x91/0xaf [zfs]
 56)     1864      12   zio_nowait+0x31/0x34 [zfs]
 57)     1852      76   arc_read_nolock+0x643/0x64d [zfs]
 58)     1776      48   arc_read+0x41/0x71 [zfs]
 59)     1728     108   dbuf_read+0x33b/0x458 [zfs]
 60)     1620      32   dbuf_findbp+0xd0/0x145 [zfs]
 61)     1588      48   dbuf_hold_impl+0x5b/0x1c5 [zfs]
 62)     1540      52   dbuf_findbp+0xb9/0x145 [zfs]
 63)     1488      48   dbuf_hold_impl+0x5b/0x1c5 [zfs]
 64)     1440      52   dbuf_findbp+0xb9/0x145 [zfs]
 65)     1388      48   dbuf_hold_impl+0x5b/0x1c5 [zfs]
 66)     1340      52   dbuf_findbp+0xb9/0x145 [zfs]
 67)     1288      48   dbuf_hold_impl+0x5b/0x1c5 [zfs]
 68)     1240     120   dnode_next_offset_level+0xd4/0x3d2 [zfs]
 69)     1120      72   dnode_next_offset+0xa8/0x153 [zfs]
 70)     1048     104   dmu_object_alloc+0xe6/0x1c8 [zfs]
 71)      944      52   zpios_dmu_object_create+0xa9/0x13e [zpios]
 72)      892     596   zpios_ioctl_cmd+0x5f3/0x1433 [zpios]
 73)      296      48   zpios_ioctl+0x1ed/0x235 [zpios]
 74)      248      32   vfs_ioctl+0x5c/0x76
 75)      216     104   do_vfs_ioctl+0x493/0x4d1
 76)      112      32   sys_ioctl+0x46/0x66
 77)       80      80   syscall_call+0x7/0xb

behlendorf · 2010-05-26T18:40:51Z

...
 64)      604     332   dsl_dir_open_spa+0x81/0x2ea [zfs]
...

This looks like low hanging fruit as well, just move 'char buf[MAXNAMELEN];' of the stack.

behlendorf · 2010-07-01T18:33:52Z

Much of this was addressed in the onnv_141 update, there are now quite a few topic branches which reduce stack usage. I'll see if we can't get as much of this as is reasonable pushed upstream. Regardless, this seems to be under control again so I'm closing this bug and we can open new ones on an as needed basis.

# This is the 1st commit message: Merge branch 'master' of https://github.com/zfsonlinux/zfs * 'master' of https://github.com/zfsonlinux/zfs: Enable QAT support in zfs-dkms RPM # This is the commit message openzfs#2: Import 0.6.5.7-0ubuntu3 # This is the commit message openzfs#3: gbp changes # This is the commit message openzfs#4: Bump ver # This is the commit message openzfs#5: -j9 baby # This is the commit message openzfs#6: Up # This is the commit message openzfs#7: Yup # This is the commit message openzfs#8: Add new module # This is the commit message openzfs#9: Up # This is the commit message openzfs#10: Up # This is the commit message openzfs#11: Bump # This is the commit message openzfs#12: Grr # This is the commit message openzfs#13: Yay # This is the commit message openzfs#14: Yay # This is the commit message openzfs#15: Yay # This is the commit message openzfs#16: Yay # This is the commit message openzfs#17: Yay # This is the commit message openzfs#18: Yay # This is the commit message openzfs#19: yay # This is the commit message openzfs#20: yay # This is the commit message openzfs#21: yay # This is the commit message openzfs#22: Update ppa script # This is the commit message openzfs#23: Update gbp conf with br changes # This is the commit message openzfs#24: Update gbp conf with br changes # This is the commit message openzfs#25: Bump # This is the commit message openzfs#26: No pristine # This is the commit message openzfs#27: Bump # This is the commit message openzfs#28: Lol whoops # This is the commit message openzfs#29: Fix name # This is the commit message openzfs#30: Fix name # This is the commit message openzfs#31: rebase # This is the commit message openzfs#32: Bump # This is the commit message openzfs#33: Bump # This is the commit message openzfs#34: Bump # This is the commit message openzfs#35: Bump # This is the commit message openzfs#36: ntrim # This is the commit message openzfs#37: Bump # This is the commit message openzfs#38: 9 # This is the commit message openzfs#39: Bump # This is the commit message openzfs#40: Bump # This is the commit message openzfs#41: Bump # This is the commit message openzfs#42: Revert "9" This reverts commit de488f1. # This is the commit message openzfs#43: Bump # This is the commit message openzfs#44: Account for zconfig.sh being removed # This is the commit message openzfs#45: Bump # This is the commit message openzfs#46: Add artful # This is the commit message openzfs#47: Add in zed.d and zpool.d scripts # This is the commit message openzfs#48: Bump # This is the commit message openzfs#49: Bump # This is the commit message openzfs#50: Bump # This is the commit message openzfs#51: Bump # This is the commit message openzfs#52: ugh # This is the commit message openzfs#53: fix zed upgrade # This is the commit message openzfs#54: Bump # This is the commit message openzfs#55: conf file zed.d # This is the commit message #56: Bump

Reduce travis build time

This introduces a dependency in the zfstest package on `attr`, the lack of which causes failures in the following tests: features/large_dnode/large_dnode_002_pos projectquota/projectid_003_pos rsend/send_realloc_dnode_size slog/slog_replay_fs xattr/xattr_001_pos xattr/xattr_003_neg xattr/xattr_004_pos xattr/xattr_005_pos xattr/xattr_006_pos xattr/xattr_007_neg xattr/xattr_011_pos xattr/xattr_013_pos

Using zfs with Lustre, an arc_read can trigger kernel memory allocation that in turn leads to a memory reclaim callback and a deadlock within a single zfs process. This change uses spl_fstrans_mark and spl_trans_unmark to prevent the reclaim attempt and the deadlock (https://zfsonlinux.topicbox.com/groups/zfs-devel/T4db2c705ec1804ba). The stack trace observed is: #0 [ffffc9002b98adc8] __schedule at ffffffff81610f2e openzfs#1 [ffffc9002b98ae68] schedule at ffffffff81611558 openzfs#2 [ffffc9002b98ae70] schedule_preempt_disabled at ffffffff8161184a openzfs#3 [ffffc9002b98ae78] __mutex_lock at ffffffff816131e8 openzfs#4 [ffffc9002b98af18] arc_buf_destroy at ffffffffa0bf37d7 [zfs] openzfs#5 [ffffc9002b98af48] dbuf_destroy at ffffffffa0bfa6fe [zfs] openzfs#6 [ffffc9002b98af88] dbuf_evict_one at ffffffffa0bfaa96 [zfs] openzfs#7 [ffffc9002b98afa0] dbuf_rele_and_unlock at ffffffffa0bfa561 [zfs] openzfs#8 [ffffc9002b98b050] dbuf_rele_and_unlock at ffffffffa0bfa32b [zfs] openzfs#9 [ffffc9002b98b100] osd_object_delete at ffffffffa0b64ecc [osd_zfs] openzfs#10 [ffffc9002b98b118] lu_object_free at ffffffffa06d6a74 [obdclass] openzfs#11 [ffffc9002b98b178] lu_site_purge_objects at ffffffffa06d7fc1 [obdclass] openzfs#12 [ffffc9002b98b220] lu_cache_shrink_scan at ffffffffa06d81b8 [obdclass] openzfs#13 [ffffc9002b98b278] shrink_slab at ffffffff811ca9d8 openzfs#14 [ffffc9002b98b338] shrink_node at ffffffff811cfd94 openzfs#15 [ffffc9002b98b3b8] do_try_to_free_pages at ffffffff811cfe63 openzfs#16 [ffffc9002b98b408] try_to_free_pages at ffffffff811d01c4 openzfs#17 [ffffc9002b98b488] __alloc_pages_slowpath at ffffffff811be7f2 openzfs#18 [ffffc9002b98b580] __alloc_pages_nodemask at ffffffff811bf3ed openzfs#19 [ffffc9002b98b5e0] new_slab at ffffffff81226304 openzfs#20 [ffffc9002b98b638] ___slab_alloc at ffffffff812272ab openzfs#21 [ffffc9002b98b6f8] __slab_alloc at ffffffff8122740c openzfs#22 [ffffc9002b98b708] kmem_cache_alloc at ffffffff81227578 openzfs#23 [ffffc9002b98b740] spl_kmem_cache_alloc at ffffffffa048a1fd [spl] openzfs#24 [ffffc9002b98b780] arc_buf_alloc_impl at ffffffffa0befba2 [zfs] openzfs#25 [ffffc9002b98b7b0] arc_read at ffffffffa0bf0924 [zfs] openzfs#26 [ffffc9002b98b858] dbuf_read at ffffffffa0bf9083 [zfs] openzfs#27 [ffffc9002b98b900] dmu_buf_hold_by_dnode at ffffffffa0c04869 [zfs] Signed-off-by: Mark Roper <markroper@gmail.com>

…openzfs#15) Signed-off-by: Paul Dagnelie <pcd@delphix.com>

Merge zfs-2.0.5

Under certain loads, the following panic is hit: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 openzfs#4 0xffffffff8066fdee at vinactivef+0xde openzfs#5 0xffffffff80670b8a at vgonel+0x1ea openzfs#6 0xffffffff806711e1 at vgone+0x31 openzfs#7 0xffffffff8065fa0d at vfs_hash_insert+0x26d openzfs#8 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#10 0xffffffff80661c2c at lookup+0x45c openzfs#11 0xffffffff80660e59 at namei+0x259 openzfs#12 0xffffffff8067e3d3 at kern_statat+0xf3 openzfs#13 0xffffffff8067eacf at sys_fstatat+0x2f openzfs#14 0xffffffff808b5ecc at amd64_syscall+0x10c openzfs#15 0xffffffff8088f07b at fast_syscall_common+0xf8 A race condition can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: rsync.net Sponsored-by: Klara, Inc.

Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 openzfs#4 0xffffffff808adc6f at trap_pfault+0x4f openzfs#5 0xffffffff80886da8 at calltrap+0x8 openzfs#6 0xffffffff80669186 at vgonel+0x186 openzfs#7 0xffffffff80669841 at vgone+0x31 openzfs#8 0xffffffff8065806d at vfs_hash_insert+0x26d openzfs#9 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#11 0xffffffff8065a28c at lookup+0x45c openzfs#12 0xffffffff806594b9 at namei+0x259 openzfs#13 0xffffffff80676a33 at kern_statat+0xf3 openzfs#14 0xffffffff8067712f at sys_fstatat+0x2f openzfs#15 0xffffffff808ae50c at amd64_syscall+0x10c openzfs#16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 openzfs#4 0xffffffff8066fdee at vinactivef+0xde openzfs#5 0xffffffff80670b8a at vgonel+0x1ea openzfs#6 0xffffffff806711e1 at vgone+0x31 openzfs#7 0xffffffff8065fa0d at vfs_hash_insert+0x26d openzfs#8 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#10 0xffffffff80661c2c at lookup+0x45c openzfs#11 0xffffffff80660e59 at namei+0x259 openzfs#12 0xffffffff8067e3d3 at kern_statat+0xf3 openzfs#13 0xffffffff8067eacf at sys_fstatat+0x2f openzfs#14 0xffffffff808b5ecc at amd64_syscall+0x10c openzfs#15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net

Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes #14501

Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 openzfs#7 0xffffffff80669841 at vgone+0x31 openzfs#8 0xffffffff8065806d at vfs_hash_insert+0x26d openzfs#9 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#11 0xffffffff8065a28c at lookup+0x45c openzfs#12 0xffffffff806594b9 at namei+0x259 openzfs#13 0xffffffff80676a33 at kern_statat+0xf3 openzfs#14 0xffffffff8067712f at sys_fstatat+0x2f openzfs#15 0xffffffff808ae50c at amd64_syscall+0x10c openzfs#16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 openzfs#7 0xffffffff8065fa0d at vfs_hash_insert+0x26d openzfs#8 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#10 0xffffffff80661c2c at lookup+0x45c openzfs#11 0xffffffff80660e59 at namei+0x259 openzfs#12 0xffffffff8067e3d3 at kern_statat+0xf3 openzfs#13 0xffffffff8067eacf at sys_fstatat+0x2f openzfs#14 0xffffffff808b5ecc at amd64_syscall+0x10c openzfs#15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes openzfs#14501

Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes #14501

Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 openzfs#1 0xffffffff8058e86f at vpanic+0x17f openzfs#2 0xffffffff8058e6e3 at panic+0x43 openzfs#3 0xffffffff808adc15 at trap_fatal+0x385 openzfs#4 0xffffffff808adc6f at trap_pfault+0x4f openzfs#5 0xffffffff80886da8 at calltrap+0x8 openzfs#6 0xffffffff80669186 at vgonel+0x186 openzfs#7 0xffffffff80669841 at vgone+0x31 openzfs#8 0xffffffff8065806d at vfs_hash_insert+0x26d openzfs#9 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#11 0xffffffff8065a28c at lookup+0x45c openzfs#12 0xffffffff806594b9 at namei+0x259 openzfs#13 0xffffffff80676a33 at kern_statat+0xf3 openzfs#14 0xffffffff8067712f at sys_fstatat+0x2f openzfs#15 0xffffffff808ae50c at amd64_syscall+0x10c openzfs#16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 openzfs#1 0xffffffff8059620f at vpanic+0x17f openzfs#2 0xffffffff81a27f4a at spl_panic+0x3a openzfs#3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 openzfs#4 0xffffffff8066fdee at vinactivef+0xde openzfs#5 0xffffffff80670b8a at vgonel+0x1ea openzfs#6 0xffffffff806711e1 at vgone+0x31 openzfs#7 0xffffffff8065fa0d at vfs_hash_insert+0x26d openzfs#8 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#10 0xffffffff80661c2c at lookup+0x45c openzfs#11 0xffffffff80660e59 at namei+0x259 openzfs#12 0xffffffff8067e3d3 at kern_statat+0xf3 openzfs#13 0xffffffff8067eacf at sys_fstatat+0x2f openzfs#14 0xffffffff808b5ecc at amd64_syscall+0x10c openzfs#15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes openzfs#14501

pyavdr mentioned this issue Mar 28, 2012

access to .zfs directory via samba causes smbd NULL deref #626

Closed

prometheanfire mentioned this issue Apr 4, 2013

cannot use zfs send recursivly #1388

Closed

dward mentioned this issue Aug 26, 2013

kernel panic on zfs send #1678

Closed

olw2005 mentioned this issue Oct 17, 2013

Kernel panic during "zfs recv" of multiple snapshot stream #1795

Closed

Xaseron mentioned this issue Jan 10, 2014

ztest: pthread_mutex_destroy() fails with EBUSY #2027

Closed

clefru mentioned this issue Mar 2, 2014

zdb -dddd dies with kernel.c:301 assertion #2154

Closed

inevity mentioned this issue Oct 15, 2014

sa_find_idx_tab() ASSERTION failed ，spl panic，lead that ls hung #2801

Closed

Mehul1313 mentioned this issue Jan 14, 2015

zfs 0.3.6-1 kernel crash #3016

Closed

dswartz mentioned this issue Apr 22, 2015

Hang importing pool in 0.6.4-1 with zvol present #3330

Closed

bprotopopov mentioned this issue Aug 11, 2015

lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() #3681

Closed

BenEstrabaud mentioned this issue Nov 27, 2015

ddt_object_remove (from ddt_sync_entry()) panics. #4051

Closed

arturpzol mentioned this issue Feb 8, 2016

ZFS 0.6.5.3, NULL pointer dereference in dmu_objset_sync #4116

Closed

zp841126 mentioned this issue Apr 22, 2016

zpool import is very slow #4550

Closed

azuretek mentioned this issue May 10, 2016

Kernel Panic, CentOS7 not syncing: bad overwrite z_wr_int_7 #4610

Closed

baukus mentioned this issue Feb 7, 2017

Directories corrupted, commands hanging indefineltly #5346

Closed

bprotopopov mentioned this issue May 10, 2017

Adding ZVol as log device of seprate pool causes deadlock. #6065

Closed

sanjeevbagewadi mentioned this issue Jan 4, 2018

With "casesensitivity=mixed" hitting an assert in ZAP code #7011

Closed

jkryl referenced this issue in mayadata-io/cstor Feb 27, 2018

Merge pull request #15 from cloudbytestorage/opt_travis

1e1137a

Reduce travis build time

KernelKyupCom mentioned this issue May 9, 2018

OpenZFS - 6363 Add UNMAP/TRIM functionality (v2) #7363

Closed

13 tasks

BonganiMalalane mentioned this issue Aug 3, 2018

Panic in NFS filehandle decoding for snapshots [code issue identified and reproducible] #7764

Closed

mattmacy mentioned this issue Sep 22, 2020

Fix ZFS_DEBUG_MODIFY assert in arc_buf_try_copy_decompressed_data #10943

Closed

12 tasks

justinpryzby mentioned this issue Mar 16, 2021

stuck in futex #11749

Closed

bu7cher mentioned this issue Jun 11, 2021

Possible circular locking dependency in dmu_buf_will_dirty_impl vs free_children #10583

Open

sdimitro pushed a commit to sdimitro/zfs that referenced this issue Dec 7, 2021

DOSE-786 agent_set_feature and object_store_stop_agent fail to resume (…

e342dde

…openzfs#15) Signed-off-by: Paul Dagnelie <pcd@delphix.com>

rkojedzinszky pushed a commit to rkojedzinszky/zfs that referenced this issue Jan 14, 2022

Merge pull request openzfs#15 from truenas/merge-2.0.5

30fb32e

Merge zfs-2.0.5

tongfw mentioned this issue Feb 6, 2023

kernel panic on write #14463

Closed

ggzengel mentioned this issue Apr 29, 2024

zpool expand destroys whole pool if old zpool signature exists #16144

Open

asomers mentioned this issue Jan 24, 2025

panic: VERIFY0(spa_open(dsname, &spa, FTAG)) failed (0 == 2) #16988

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce Stack Usage #15

Reduce Stack Usage #15

behlendorf commented May 26, 2010

behlendorf commented May 26, 2010

behlendorf commented Jul 1, 2010

Reduce Stack Usage #15

Reduce Stack Usage #15

Comments

behlendorf commented May 26, 2010

behlendorf commented May 26, 2010

behlendorf commented Jul 1, 2010