Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double zio_free #2827

Closed
ryao opened this issue Oct 24, 2014 · 9 comments
Closed

Double zio_free #2827

ryao opened this issue Oct 24, 2014 · 9 comments
Milestone

Comments

@ryao
Copy link
Contributor

ryao commented Oct 24, 2014

I hit something that looks like #2523 twice in the past 24 hours, but it is subtly different:

[29585.947854] general protection fault: 0000 [#1] SMP 
[29585.947872] Modules linked in: tun bridge stp llc w83627ehf hwmon_vid snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_usb_audio snd_hda_controller x86_pkg_temp_thermal snd_hda_codec i915 snd_usbmidi_lib snd_rawmidi snd_seq_device snd_hwdep coretemp snd_pcm kvm_intel kvm iTCO_wdt iTCO_vendor_support snd_timer drm_kms_helper e1000e crc32c_intel snd drm firewire_ohci ghash_clmulni_intel cryptd lpc_ich microcode soundcore serio_raw pcspkr mfd_core video xts gf128mul cbc sha256_generic iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vmxnet3 virtio_net virtio_ring virtio tg3 sky2 r8169 pcnet32 mii igb ptp pps_core dca i2c_algo_bit i2c_core e1000 bnx2 atl1c fuse zfs(POE) ztaskq(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) xfs libcrc32c nfs lockd sunrpc
[29585.948101]  fscache jfs reiserfs btrfs ext4 jbd2 ext3 jbd ext2 mbcache linear raid0 dm_raid raid456 raid1 raid10 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod firewire_core crc_itu_t sl811_hcd xhci_hcd usb_storage mpt2sas raid_class aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm aacraid sx8 hpsa cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase imm parport sym53c8xx initio arcmsr aic7xxx aic79xx scsi_transport_spi sr_mod cdrom sg sd_mod crc_t10dif crct10dif_common pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_via
[29585.948330]  pata_jmicron pata_marvell pata_sis pata_netcell pata_pdc202xx_old pata_atiixp pata_amd pata_ali pata_it8213 pata_pcmcia pata_serverworks pata_oldpiix pata_artop pata_it821x pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_sil680 pata_pdc2027x
[29585.948400] CPU: 2 PID: 2090 Comm: z_fr_iss_7 Tainted: P           OE 3.16.1 #6
[29585.948418] Hardware name: Supermicro X10SAE/X10SAE, BIOS 2.00 04/21/2014
[29585.948434] task: ffff8807e5ea8190 ti: ffff8807e5eac000 task.ti: ffff8807e5eac000
[29585.948452] RIP: 0010:[<ffffffff8157a5e9>]  [<ffffffff8157a5e9>] __mutex_unlock_slowpath+0x29/0x40
[29585.948477] RSP: 0018:ffff8807e5eafd30  EFLAGS: 00010203
[29585.948490] RAX: 7a7a7a7a7a7a7a7a RBX: ffff8803c6334c98 RCX: dead000000200200
[29585.948507] RDX: ffff8803c6334ca0 RSI: 0000000000000246 RDI: ffff8803c6334c9c
[29585.948524] RBP: ffff8807e5eafd38 R08: ffff88076824fa20 R09: 0000000000000001
[29585.948540] R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000000
[29585.948556] R13: 0000000000000000 R14: ffff8803c6334c98 R15: 0000000000200000
[29585.948572] FS:  0000000000000000(0000) GS:ffff88081fa80000(0000) knlGS:0000000000000000
[29585.948591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[29585.948604] CR2: 00007f40dba1c7d8 CR3: 000000059e50a000 CR4: 00000000001427e0
[29585.948620] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[29585.948636] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[29585.948652] Stack:
[29585.948658]  ffff8803c6334978 ffff8807e5eafd48 ffffffff8157a61b ffff8807e5eafdb8
[29585.948677]  ffffffffa0aeec6a ffffffff8157a842 ffff8803c6334978 ffff8803c6334c98
[29585.948697]  ffffffffa0aedd9b ffffffff8157a842 ffff8803c6334978 ffff8807e5eafdb8
[29585.948717] Call Trace:
[29585.948725]  [<ffffffff8157a61b>] mutex_unlock+0x1b/0x20
[29585.948768]  [<ffffffffa0aeec6a>] zio_done.part.6+0x8ea/0x1880 [zfs]
[29585.948784]  [<ffffffff8157a842>] ? mutex_lock+0x12/0x2f
[29585.948816]  [<ffffffffa0aedd9b>] ? zio_ready+0x1ab/0x790 [zfs]
[29585.948836]  [<ffffffff8157a842>] ? mutex_lock+0x12/0x2f
[29585.948867]  [<ffffffffa0aefc72>] zio_done+0x72/0x80 [zfs]
[29585.948898]  [<ffffffffa0ae7186>] zio_execute+0x156/0x2f0 [zfs]
[29585.948914]  [<ffffffffa07cee6c>] taskq_thread+0x27c/0x720 [ztaskq]
[29585.948931]  [<ffffffffa07cebf0>] ? taskq_thread_wait.isra.2+0x130/0x130 [ztaskq]
[29585.948952]  [<ffffffffa06bce9f>] thread_generic_wrapper+0x5f/0xa0 [spl]
[29585.948973]  [<ffffffffa06bce40>] ? __thread_exit+0x20/0x20 [spl]
[29585.948992]  [<ffffffff8106c7c9>] kthread+0xc9/0xe0
[29585.949006]  [<ffffffff8106c700>] ? kthread_create_on_node+0x170/0x170
[29585.949026]  [<ffffffff8157c3ec>] ret_from_fork+0x7c/0xb0
[29585.949043]  [<ffffffff8106c700>] ? kthread_create_on_node+0x170/0x170
[29585.949058] Code: 1f 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb c7 07 01 00 00 00 48 8d 7f 04 e8 04 17 00 00 48 8b 43 08 48 8d 53 08 48 39 d0 74 09 <48> 8b 78 10 e8 6e 13 b0 ff 66 83 43 04 01 5b 5d c3 66 0f 1f 44 
[29585.949144] RIP  [<ffffffff8157a5e9>] __mutex_unlock_slowpath+0x29/0x40
[29585.949161]  RSP <ffff8807e5eafd30>
[29585.956056] ---[ end trace 626b7246afda87d7 ]---

This occurred on my local workstation where I have the early ztaskq import code. Anyway, it occurs to me that this could happen if the underlying hardware or block layer reports that the IO completion twice. This makes it potentially different than #2523 because it is not immediately clear how an IO completion being reported twice would damage a spinlock used in condensing metaslabs.

@ryao
Copy link
Contributor Author

ryao commented Oct 24, 2014

This might be illumos 5244:

https://www.illumos.org/projects/illumos-gate//issues/5244

@behlendorf
Copy link
Contributor

@ryao I've definitely seen this failure mode before in 0.6.3 and I've been treating it as a variant of #2523. If the IO was completed twice then the first one would free it back to the slab. The second IO completion could cause at least these two failure modes.

  1. A bad paging request / GPF as zio_done() attempts to access and invalid address. Just like this.
  2. A hang in mutex_unlink_slowpath() if the original zio memory address is still and was reallocated for a new zio. This would be likely since they're backed by a slab, just like zio->io_lock() spinlock damaged #2523

What I haven't been able to clearly show is how this can happen.

@behlendorf behlendorf added this to the 0.6.4 milestone Oct 24, 2014
@ryao
Copy link
Contributor Author

ryao commented Oct 24, 2014

I just hit this twice during the same boot within the span of about 30 minutes with #2828 applied. I am running 10 concurrent instances of ztest, plus a VM on a zvol. This seems to make this more likely to trigger.

[ 2006.642102] general protection fault: 0000 [#1] SMP 
[ 2006.642121] Modules linked in: tun bridge stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_usb_audio x86_pkg_temp_thermal snd_hda_controller coretemp snd_hda_codec kvm_intel kvm snd_usbmidi_lib snd_rawmidi snd_hwdep snd_seq_device drm_kms_helper crc32c_intel snd_pcm ghash_clmulni_intel drm cryptd e1000e snd_timer snd firewire_ohci microcode iTCO_wdt iTCO_vendor_support serio_raw pcspkr soundcore lpc_ich mfd_core video xts gf128mul cbc sha256_generic iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vmxnet3 virtio_net virtio_ring virtio tg3 sky2 r8169 pcnet32 mii igb ptp pps_core dca i2c_algo_bit i2c_core e1000 bnx2 atl1c fuse zfs(POE) ztaskq(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) xfs libcrc32c nfs lockd sunrpc
[ 2006.642348]  fscache jfs reiserfs btrfs ext4 jbd2 ext3 jbd ext2 mbcache linear raid0 dm_raid raid456 raid1 raid10 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod firewire_core crc_itu_t sl811_hcd xhci_hcd usb_storage mpt2sas raid_class aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm aacraid sx8 hpsa cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase imm parport sym53c8xx initio arcmsr aic7xxx aic79xx scsi_transport_spi sr_mod cdrom sg sd_mod crc_t10dif crct10dif_common pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_via
[ 2006.642577]  pata_jmicron pata_marvell pata_sis pata_netcell pata_pdc202xx_old pata_atiixp pata_amd pata_ali pata_it8213 pata_pcmcia pata_serverworks pata_oldpiix pata_artop pata_it821x pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_sil680 pata_pdc2027x
[ 2006.642649] CPU: 7 PID: 2077 Comm: z_fr_iss_4 Tainted: P           OE 3.16.1 #6
[ 2006.642667] Hardware name: Supermicro X10SAE/X10SAE, BIOS 2.00 04/21/2014
[ 2006.642685] task: ffff8807e5422110 ti: ffff8807e5428000 task.ti: ffff8807e5428000
[ 2006.642702] RIP: 0010:[<ffffffff8157a5e9>]  [<ffffffff8157a5e9>] __mutex_unlock_slowpath+0x29/0x40
[ 2006.642728] RSP: 0018:ffff8807e542bd30  EFLAGS: 00010203
[ 2006.642741] RAX: 7a7a7a7a7a7a7a7a RBX: ffff8801aeab0c98 RCX: dead000000200200
[ 2006.642757] RDX: ffff8801aeab0ca0 RSI: 0000000000000246 RDI: ffff8801aeab0c9c
[ 2006.642773] RBP: ffff8807e542bd38 R08: ffff8800356efa18 R09: 0000000000000001
[ 2006.642789] R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000000
[ 2006.642806] R13: 0000000000000000 R14: ffff8801aeab0c98 R15: ffff8807e5422110
[ 2006.642822] FS:  0000000000000000(0000) GS:ffff88081fbc0000(0000) knlGS:0000000000000000
[ 2006.642841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2006.642854] CR2: 00007f0f3c0cd048 CR3: 00000005d8589000 CR4: 00000000001427e0
[ 2006.642870] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2006.642887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2006.642903] Stack:
[ 2006.642908]  ffff8801aeab0978 ffff8807e542bd48 ffffffff8157a61b ffff8807e542bdb8
[ 2006.642928]  ffffffffa0aeeeca ffffffff8157a842 ffff8801aeab0c98 ffff8801aeab0c98
[ 2006.642948]  ffffffffa0aedff9 ffffffff8157a842 ffff8801aeab0978 ffff8807e542bdb8
[ 2006.642967] Call Trace:
[ 2006.642976]  [<ffffffff8157a61b>] mutex_unlock+0x1b/0x20
[ 2006.643018]  [<ffffffffa0aeeeca>] zio_done.part.6+0x8ea/0x18b0 [zfs]
[ 2006.643034]  [<ffffffff8157a842>] ? mutex_lock+0x12/0x2f
[ 2006.643065]  [<ffffffffa0aedff9>] ? zio_ready+0x1c9/0x7b0 [zfs]
[ 2006.643089]  [<ffffffff8157a842>] ? mutex_lock+0x12/0x2f
[ 2006.643121]  [<ffffffffa0aeff02>] zio_done+0x72/0x80 [zfs]
[ 2006.643152]  [<ffffffffa0ae73db>] zio_execute+0x14b/0x2f0 [zfs]
[ 2006.643168]  [<ffffffffa07ceea3>] taskq_thread+0x293/0x730 [ztaskq]
[ 2006.643185]  [<ffffffffa07cec10>] ? taskq_thread_wait.isra.2+0x130/0x130 [ztaskq]
[ 2006.643207]  [<ffffffffa06bcb0f>] thread_generic_wrapper+0x5f/0xa0 [spl]
[ 2006.643229]  [<ffffffffa06bcab0>] ? __thread_exit+0x20/0x20 [spl]
[ 2006.643246]  [<ffffffff8106c7c9>] kthread+0xc9/0xe0
[ 2006.643258]  [<ffffffff8106c700>] ? kthread_create_on_node+0x170/0x170
[ 2006.643274]  [<ffffffff8157c3ec>] ret_from_fork+0x7c/0xb0
[ 2006.643288]  [<ffffffff8106c700>] ? kthread_create_on_node+0x170/0x170
[ 2006.643303] Code: 1f 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb c7 07 01 00 00 00 48 8d 7f 04 e8 04 17 00 00 48 8b 43 08 48 8d 53 08 48 39 d0 74 09 <48> 8b 78 10 e8 6e 13 b0 ff 66 83 43 04 01 5b 5d c3 66 0f 1f 44 
[ 2006.643389] RIP  [<ffffffff8157a5e9>] __mutex_unlock_slowpath+0x29/0x40
[ 2006.643405]  RSP <ffff8807e542bd30>
[ 2006.650187] ---[ end trace 7a53cd7d38f6accb ]---
[ 3375.144877] general protection fault: 0000 [#2] SMP 
[ 3375.144895] Modules linked in: tun bridge stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_usb_audio x86_pkg_temp_thermal snd_hda_controller coretemp snd_hda_codec kvm_intel kvm snd_usbmidi_lib snd_rawmidi snd_hwdep snd_seq_device drm_kms_helper crc32c_intel snd_pcm ghash_clmulni_intel drm cryptd e1000e snd_timer snd firewire_ohci microcode iTCO_wdt iTCO_vendor_support serio_raw pcspkr soundcore lpc_ich mfd_core video xts gf128mul cbc sha256_generic iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vmxnet3 virtio_net virtio_ring virtio tg3 sky2 r8169 pcnet32 mii igb ptp pps_core dca i2c_algo_bit i2c_core e1000 bnx2 atl1c fuse zfs(POE) ztaskq(POE) zunicode(POE) zavl(POE) zcommon(POE) znvpair(POE) spl(OE) xfs libcrc32c nfs lockd sunrpc
[ 3375.145123]  fscache jfs reiserfs btrfs ext4 jbd2 ext3 jbd ext2 mbcache linear raid0 dm_raid raid456 raid1 raid10 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod firewire_core crc_itu_t sl811_hcd xhci_hcd usb_storage mpt2sas raid_class aic94xx libsas lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm aacraid sx8 hpsa cciss 3w_9xxx 3w_xxxx mptsas scsi_transport_sas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase imm parport sym53c8xx initio arcmsr aic7xxx aic79xx scsi_transport_spi sr_mod cdrom sg sd_mod crc_t10dif crct10dif_common pdc_adma sata_inic162x sata_mv ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_via
[ 3375.145352]  pata_jmicron pata_marvell pata_sis pata_netcell pata_pdc202xx_old pata_atiixp pata_amd pata_ali pata_it8213 pata_pcmcia pata_serverworks pata_oldpiix pata_artop pata_it821x pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_sil680 pata_pdc2027x
[ 3375.145422] CPU: 2 PID: 2080 Comm: z_fr_iss_4 Tainted: P      D    OE 3.16.1 #6
[ 3375.145440] Hardware name: Supermicro X10SAE/X10SAE, BIOS 2.00 04/21/2014
[ 3375.145456] task: ffff8807e543c1d0 ti: ffff8807e5440000 task.ti: ffff8807e5440000
[ 3375.145474] RIP: 0010:[<ffffffff8157a5e9>]  [<ffffffff8157a5e9>] __mutex_unlock_slowpath+0x29/0x40
[ 3375.145499] RSP: 0018:ffff8807e5443d30  EFLAGS: 00010203
[ 3375.145512] RAX: 7a7a7a7a7a7a7a7a RBX: ffff8803aa55b588 RCX: dead000000200200
[ 3375.145528] RDX: ffff8803aa55b590 RSI: 0000000000000246 RDI: ffff8803aa55b58c
[ 3375.145545] RBP: ffff8807e5443d38 R08: ffff8800356efad8 R09: 0000000000000001
[ 3375.145561] R10: 0000000000000000 R11: 0000000000000010 R12: 0000000000000000
[ 3375.145577] R13: 0000000000000000 R14: ffff8803aa55b588 R15: ffff8807e543c1d0
[ 3375.145593] FS:  0000000000000000(0000) GS:ffff88081fa80000(0000) knlGS:0000000000000000
[ 3375.145612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3375.145625] CR2: 00007f0f3c0bb000 CR3: 00000005d8589000 CR4: 00000000001427f0
[ 3375.145641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3375.145657] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3375.145673] Stack:
[ 3375.145678]  ffff8803aa55b268 ffff8807e5443d48 ffffffff8157a61b ffff8807e5443db8
[ 3375.145698]  ffffffffa0aeeeca ffffffff8157a842 ffff8803aa55b588 ffff8803aa55b588
[ 3375.145718]  ffffffffa0aedff9 ffffffff8157a842 ffff8803aa55b268 ffff8807e5443db8
[ 3375.147532] Call Trace:
[ 3375.148953]  [<ffffffff8157a61b>] mutex_unlock+0x1b/0x20
[ 3375.150357]  [<ffffffffa0aeeeca>] zio_done.part.6+0x8ea/0x18b0 [zfs]
[ 3375.152017]  [<ffffffff8157a842>] ? mutex_lock+0x12/0x2f
[ 3375.153564]  [<ffffffffa0aedff9>] ? zio_ready+0x1c9/0x7b0 [zfs]
[ 3375.154875]  [<ffffffff8157a842>] ? mutex_lock+0x12/0x2f
[ 3375.156178]  [<ffffffffa0aeff02>] zio_done+0x72/0x80 [zfs]
[ 3375.157574]  [<ffffffffa0ae73db>] zio_execute+0x14b/0x2f0 [zfs]
[ 3375.158741]  [<ffffffffa07ceea3>] taskq_thread+0x293/0x730 [ztaskq]
[ 3375.160091]  [<ffffffffa07cec10>] ? taskq_thread_wait.isra.2+0x130/0x130 [ztaskq]
[ 3375.161414]  [<ffffffffa06bcb0f>] thread_generic_wrapper+0x5f/0xa0 [spl]
[ 3375.162599]  [<ffffffffa06bcab0>] ? __thread_exit+0x20/0x20 [spl]
[ 3375.163736]  [<ffffffff8106c7c9>] kthread+0xc9/0xe0
[ 3375.164904]  [<ffffffff8106c700>] ? kthread_create_on_node+0x170/0x170
[ 3375.166075]  [<ffffffff8157c3ec>] ret_from_fork+0x7c/0xb0
[ 3375.167237]  [<ffffffff8106c700>] ? kthread_create_on_node+0x170/0x170
[ 3375.168475] Code: 1f 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb c7 07 01 00 00 00 48 8d 7f 04 e8 04 17 00 00 48 8b 43 08 48 8d 53 08 48 39 d0 74 09 <48> 8b 78 10 e8 6e 13 b0 ff 66 83 43 04 01 5b 5d c3 66 0f 1f 44 
[ 3375.169730] RIP  [<ffffffff8157a5e9>] __mutex_unlock_slowpath+0x29/0x40
[ 3375.170954]  RSP <ffff8807e5443d30>
[ 3375.172338] ---[ end trace 7a53cd7d38f6accc ]---
[ 3773.931646] perf interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

@jpecar
Copy link

jpecar commented Nov 26, 2014

I'm hitting this bug (or something very similiar) on a ZoL server that is serving bioinformatics over NFS. I'm hitting it all too frequently (within hours). Since this is a production machine, I'm moving this load elsewhere. Crash kdump available.

[ 3327.860492] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 3327.860522] IP: [] __mutex_unlock_slowpath+0x2d/0x50
[ 3327.860548] PGD 7d19664067 PUD 7d1aeaa067 PMD 0
[ 3327.860564] Oops: 0000 [#1] SMP
[ 3327.860577] Modules linked in: binfmt_misc ip6t_rpfilter ip6t_REJECT 8021q garp mrp ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge
stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table
_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptab
le_filter ip_tables sg ext4 mbcache jbd2 iTCO_wdt iTCO_vendor_support dm_round_robin ib_ipoib ses enclosure coretemp kvm crct10dif_pclmul crc32_p
clmul crc32c_intel ghash_clmulni_intel mei_me mei i2c_i801 aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr lpc_ich ioatdma mfd_cor
e shpchp rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ipmi_si ipmi_msghandler ib_cm iw_cm
[ 3327.860846] ib_sa acpi_cpufreq wmi mperf acpi_pad acpi_power_meter nfsd auth_rpcgss nfs_acl lockd sunrpc dm_multipath xfs libcrc32c raid1 sd_mod crc_t10dif crct10dif_common ast syscopyarea sysfillrect sysimgblt ib_qib i2c_algo_bit drm_kms_helper ttm ib_mad drm ixgbe mpt2sas ib_core nvme ahci libahci ib_addr mdio libata raid_class ptp myri10ge i2c_core pps_core scsi_transport_sas dca dm_mirror dm_region_hash dm_log dm_mod zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) zlib_deflate
[ 3327.861022] CPU: 2 PID: 7938 Comm: z_fr_iss_0/0 Tainted: PF IO-------------- 3.10.0-123.9.3.el7.x86_64 #1
[ 3327.861050] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0a 08/26/2014
[ 3327.861070] task: ffff883eb28ccfa0 ti: ffff883eaf59c000 task.ti: ffff883eaf59c000
[ 3327.861092] RIP: 0010:[] [] __mutex_unlock_slowpath+0x2d/0x50
[ 3327.861119] RSP: 0018:ffff883eaf59dd48 EFLAGS: 00010213
[ 3327.861135] RAX: 0000000000000000 RBX: ffff886ad5c54f08 RCX: dead000000200200
[ 3327.861155] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff886ad5c54f04
[ 3327.861174] RBP: ffff883eaf59dd58 R08: ffff883e9dd6b768 R09: 0000000000000001
[ 3327.861196] R10: 0000000000015ab9 R11: 000000000000000f R12: ffff886ad5c54f04
[ 3327.861216] R13: 0000000000000000 R14: ffff883eb28ccfa0 R15: ffff886ad5c54bf0
[ 3327.861235] FS: 0000000000000000(0000) GS:ffff883f7fc80000(0000) knlGS:0000000000000000
[ 3327.861257] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3327.861275] CR2: 0000000000000010 CR3: 0000007d21904000 CR4: 00000000001407e0
[ 3327.861295] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3327.863663] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3327.865879] Stack:
[ 3327.867505] ffff886ad5c54f00 ffff886ad5c54bf0 ffff883eaf59dd68 ffffffff815e5e8b
[ 3327.869168] ffff883eaf59ddc8 ffffffffa0161e39 ffffffff815e5ee2 ffff886ad5c54bf0
[ 3327.870837] ffff886ad5c54f00 ffff886ad5c54bf0 ffff883eaf59ddc8 ffff886ad5c54bf0
[ 3327.872486] Call Trace:
[ 3327.874082] [] mutex_unlock+0x1b/0x20
[ 3327.875697] [] zio_done.part.8+0x589/0xd50 [zfs]
[ 3327.877257] [] ? mutex_lock+0x12/0x2f
[ 3327.878830] [] zio_done+0x72/0x80 [zfs]
[ 3327.880420] [] zio_execute+0xb6/0x160 [zfs]
[ 3327.881913] [] taskq_thread+0x1be/0x360 [spl]
[ 3327.883379] [] ? wake_up_state+0x20/0x20
[ 3327.884838] [] ? taskq_cancel_id+0x140/0x140 [spl]
[ 3327.886259] [] kthread+0xcf/0xe0
[ 3327.887645] [] ? kthread_create_on_node+0x140/0x140
[ 3327.889056] [] ret_from_fork+0x7c/0xb0
[ 3327.890409] [] ? kthread_create_on_node+0x140/0x140
[ 3327.891744] Code: 44 00 00 55 48 89 e5 41 54 4c 8d 67 04 53 48 89 fb c7 07 01 00 00 00 4c 89 e7 48 83 c3 08 e8 9b 30 00 00 48 8b 03 48 39 d8 74 09 <48> 8b 78 10 e8 da 0d ab ff 4c 89 e7 e8 b2 2f 00 00 5b 41 5c 5d
[ 3327.894443] RIP [] __mutex_unlock_slowpath+0x2d/0x50
[ 3327.895747] RSP
[ 3327.897028] CR2: 0000000000000010

@behlendorf
Copy link
Contributor

@jpecar Thanks for providing the extra debugging, we're working on it. The odd thing about your stack is the address which was NULL dereferenced, 0x0000000000000010. This would suggest that a NULL was passed to mutex_exit(), and it's still not quite clear how that could happen. Regardless, we've got a few reproducers now which is helpful, thanks.

@jpecar
Copy link

jpecar commented Dec 15, 2014

I just managed to reproduce this exact same crash with one zvol exported over iscsi.

Can you recommend any settings that would stop this from happening? I need this host to be up & running.

@jpecar
Copy link

jpecar commented Dec 18, 2014

I got this crash again with very light load (less than 200kB/s). Makes me want to go back to classic md raid + lvm.

@behlendorf
Copy link
Contributor

A fix for this issue has been proposed in openzfs/spl#421 and is under review.

behlendorf pushed a commit to openzfs/spl that referenced this issue Dec 19, 2014
It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #421
@behlendorf
Copy link
Contributor

This issue which is a duplicate of #2523 was resolved by the following commit. Full details can be found in the commit message and related lwn article.

openzfs/spl@a3c1eb7 mutex: force serialization on mutex_exit() to fix races

dajhorn added a commit to zfsonlinux/pkg-spl that referenced this issue Dec 20, 2014
Commit: openzfs/zfs@a3c1eb7
From: Chunwei Chen <tuxoko@gmail.com>
Date: Fri, 19 Dec 2014 11:31:59 +0800
Subject: mutex: force serialization on mutex_exit() to fix races

It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Backported-by: Darik Horn <dajhorn@vanadac.com>
Closes #421

Conflicts:
        include/sys/mutex.h
behlendorf pushed a commit to openzfs/spl that referenced this issue Dec 23, 2014
It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #421
ryao pushed a commit to ryao/spl that referenced this issue Feb 19, 2015
It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes openzfs#421

Conflicts:
	include/sys/mutex.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants