CPU lock up and ZFS crash #2891

lintonv · 2014-11-12T22:17:41Z

I am seeing the following cpu lockup and zfs crash after 4 hours of IO testing. I am running code from the GIT master as of September 27th. I did not see these before that. I was running the stable version of 0.6.3 (tar.gz) before that.

This is the backtrace :

Nov 12 21:44:31 ssn-0-5-38 kernel: BUG: soft lockup - CPU#4 stuck for 67s! [ggaoed:4467]
Nov 12 21:44:31 ssn-0-5-38 kernel: Modules linked in: 8021q garp stp llc zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) sp
l(U) zlib_deflate aoe(U) ipv6 i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000(U) sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: aoe]
Nov 12 21:44:31 ssn-0-5-38 kernel: CPU 4
Nov 12 21:44:31 ssn-0-5-38 kernel: Modules linked in: 8021q garp stp llc zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) sp
l(U) zlib_deflate aoe(U) ipv6 i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000(U) sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: aoe]
Nov 12 21:44:31 ssn-0-5-38 kernel:
Nov 12 21:44:31 ssn-0-5-38 kernel: Pid: 4467, comm: ggaoed Tainted: P --------------- 2.6.32-279.el6.x86_64 #1 SeaMicro Sabre2
/Type2 - Board Product Name1
Nov 12 21:44:31 ssn-0-5-38 kernel: RIP: 0010:[] [] _spin_lock+0x21/0x30
Nov 12 21:44:31 ssn-0-5-38 kernel: RSP: 0018:ffff8808130a78b8 EFLAGS: 00000202
Nov 12 21:44:31 ssn-0-5-38 kernel: RAX: 0000000000000000 RBX: ffff8808130a78b8 RCX: ffff880812dae430
Nov 12 21:44:31 ssn-0-5-38 kernel: RDX: 0000000000000002 RSI: ffff880662568ec0 RDI: ffff8806c470d40c
Nov 12 21:44:31 ssn-0-5-38 kernel: RBP: ffffffff8100bc0e R08: ffff88069acb8e30 R09: 0000000000000001
Nov 12 21:44:31 ssn-0-5-38 kernel: R10: dead000000200200 R11: dead000000100100 R12: ffffffff814fed1e
Nov 12 21:44:31 ssn-0-5-38 kernel: R13: ffff8808130a7838 R14: 0000000000008030 R15: ffff8808130a7848
Nov 12 21:44:31 ssn-0-5-38 kernel: FS: 00007f27ea3c9700(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
Nov 12 21:44:31 ssn-0-5-38 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 12 21:44:31 ssn-0-5-38 kernel: CR2: 00007f4367dfcdd0 CR3: 0000000817e6e000 CR4: 00000000000406e0
Nov 12 21:44:31 ssn-0-5-38 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 12 21:44:31 ssn-0-5-38 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 12 21:44:31 ssn-0-5-38 kernel: Process ggaoed (pid: 4467, threadinfo ffff8808130a6000, task ffff880813069500)
Nov 12 21:44:31 ssn-0-5-38 kernel: Stack:
Nov 12 21:44:31 ssn-0-5-38 kernel: ffff8808130a78d8 ffffffff814fea85 ffff8806c470d3b0 ffff88069acb8dc0
Nov 12 21:44:31 ssn-0-5-38 kernel: ffff8808130a78e8 ffffffff814fea5b ffff8808130a7968 ffffffffa04ac7ca
Nov 12 21:44:31 ssn-0-5-38 kernel: ffff8808130a7938 ffff880724d3cea0 ffff88069acb8e00 0000000112dae330
Nov 12 21:44:31 ssn-0-5-38 kernel: Call Trace:
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __mutex_unlock_slowpath+0x25/0x60
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? mutex_unlock+0x1b/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dbuf_dirty+0x43a/0x820 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_buf_will_dirty+0xaf/0x100 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_write_uio_dnode+0x70/0x140 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_write_uio_dbuf+0x46/0x60 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zfs_write+0xb95/0xcf0 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __switch_to+0x1ac/0x320
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? kmem_alloc_debug+0x13a/0x4c0 [spl]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zpl_aio_write+0xe2/0x130 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zpl_aio_write+0x0/0x130 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? do_sync_readv_writev+0xfb/0x140
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? autoremove_wake_function+0x0/0x40
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? ep_poll_callback+0xbb/0xf0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? security_file_permission+0x16/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? do_readv_writev+0xd6/0x1f0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __wake_up_locked_key+0x18/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? eventfd_write+0x193/0x1d0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? vfs_writev+0x46/0x60
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? sys_pwritev+0xa2/0xc0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? system_call_fastpath+0x16/0x1b
Nov 12 21:44:31 ssn-0-5-38 kernel: Code: 01 74 05 e8 e2 e3 d7 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8
10 39 c2 74 0e f3 90 0f b7 17 f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f
Nov 12 21:44:31 ssn-0-5-38 kernel: Call Trace:
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __mutex_unlock_slowpath+0x25/0x60
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? mutex_unlock+0x1b/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dbuf_dirty+0x43a/0x820 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_buf_will_dirty+0xaf/0x100 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_write_uio_dnode+0x70/0x140 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_write_uio_dbuf+0x46/0x60 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zfs_write+0xb95/0xcf0 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __switch_to+0x1ac/0x320
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? kmem_alloc_debug+0x13a/0x4c0 [spl]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zpl_aio_write+0xe2/0x130 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zpl_aio_write+0x0/0x130 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? do_sync_readv_writev+0xfb/0x140
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? autoremove_wake_function+0x0/0x40
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? ep_poll_callback+0xbb/0xf0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? security_file_permission+0x16/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? do_readv_writev+0xd6/0x1f0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __wake_up_locked_key+0x18/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? eventfd_write+0x193/0x1d0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? vfs_writev+0x46/0x60
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? sys_pwritev+0xa2/0xc0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? system_call_fastpath+0x16/0x1b

This is the output of /proc/slabinfo :

zil_lwb_cache l2arc_buf_hdr_t arc_buf_t 779383 779630 arc_buf_hdr_t dmu_buf_impl_t dnode_t sa_cache lz4_cache zio_data_buf_16384 zio_buf_16384 zio_data_buf_14336 zio_buf_14336 zio_data_buf_12288 zio_buf_12288 zio_data_buf_10240 zio_buf_10240 zio_data_buf_8192 zio_buf_8192 zio_data_buf_7168 zio_buf_7168 zio_data_buf_6144 zio_buf_6144 zio_data_buf_5120 zio_buf_5120 zio_data_buf_4096 458962 458962 zio_buf_4096 zio_data_buf_3584 zio_buf_3584 zio_data_buf_3072 zio_buf_3072 zio_data_buf_2560 zio_buf_2560 zio_data_buf_2048 zio_buf_2048 zio_data_buf_1536 zio_buf_1536 zio_data_buf_1024 zio_buf_1024 zio_data_buf_512 zio_buf_512 zio_link_cache zio_cache ddt_entry_cache range_seg_cache 301740 305443 0 0 200 19 1 : tunables 120 60 8 : slabdata 0 0 0
0 0 40 92 1 : tunables 120 60 8 : slabdata 0 0 0
72 53 1 : tunables 120 60 8 : slabdata 14710 14710 0
779375 779441 288 13 1 : tunables 54 27 8 : slabdata 59957 59957 0
779418 779506 280 14 1 : tunables 54 27 8 : slabdata 55679 55679 0
1311 1325 816 5 1 : tunables 54 27 8 : slabdata 265 265 0
191 240 80 48 1 : tunables 120 60 8 : slabdata 5 5 0
0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
258042 258042 16384 1 4 : tunables 8 4 0 : slabdata 258042 258042 0
0 0 14336 1 4 : tunables 8 4 0 : slabdata 0 0 0
0 0 14336 1 4 : tunables 8 4 0 : slabdata 0 0 0
0 0 12288 1 4 : tunables 8 4 0 : slabdata 0 0 0
0 0 12288 1 4 : tunables 8 4 0 : slabdata 0 0 0
0 0 10240 1 4 : tunables 8 4 0 : slabdata 0 0 0
0 0 10240 1 4 : tunables 8 4 0 : slabdata 0 0 0
0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
7 7 8192 1 2 : tunables 8 4 0 : slabdata 7 7 0
0 0 7168 1 2 : tunables 8 4 0 : slabdata 0 0 0
0 0 7168 1 2 : tunables 8 4 0 : slabdata 0 0 0
0 0 6144 1 2 : tunables 8 4 0 : slabdata 0 0 0
0 0 6144 1 2 : tunables 8 4 0 : slabdata 0 0 0
0 0 5120 1 2 : tunables 8 4 0 : slabdata 0 0 0
4 4 5120 1 2 : tunables 8 4 0 : slabdata 4 4 0
4096 1 1 : tunables 24 12 8 : slabdata 458962 458962 0
62202 62202 4096 1 1 : tunables 24 12 8 : slabdata 62202 62202 0
0 0 3584 1 1 : tunables 24 12 8 : slabdata 0 0 0
0 0 3584 1 1 : tunables 24 12 8 : slabdata 0 0 0
0 0 3072 2 2 : tunables 24 12 8 : slabdata 0 0 0
0 0 3072 2 2 : tunables 24 12 8 : slabdata 0 0 0
0 0 2560 3 2 : tunables 24 12 8 : slabdata 0 0 0
0 0 2560 3 2 : tunables 24 12 8 : slabdata 0 0 0
0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0
82 94 2048 2 1 : tunables 24 12 8 : slabdata 47 47 0
0 0 1536 5 2 : tunables 24 12 8 : slabdata 0 0 0
65 115 1536 5 2 : tunables 24 12 8 : slabdata 23 23 0
0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0
0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0
0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0
732 1000 512 8 1 : tunables 54 27 8 : slabdata 125 125 0
208 6006 48 77 1 : tunables 120 60 8 : slabdata 78 78 0
202 343 1072 7 2 : tunables 24 12 8 : slabdata 49 49 0
0 0 448 8 1 : tunables 54 27 8 : slabdata 0 0 0
64 59 1 : tunables 120 60 8 : slabdata 5177 5177 0

behlendorf · 2014-11-13T00:49:59Z

This is a duplicate of #2523. It's something we absolutely need to address before the next tag. Can you describe your test setup and what workload your using to reproduce this.

lintonv · 2014-11-13T16:14:06Z

@behlendorf Thanks.

Here is the test environment.
128K blocks
Random IO
20 Threads
50 Volumes

Looking at the code, could this be caused by zpl_aio_write in (zpl_file.c). I don't quite understand the locking mechanism in that code. That could be the root cause.

behlendorf · 2014-11-13T20:55:33Z

@lintonv This issue actually goes pretty far back and predates the AIO code. But for some reason it has become much more common is the latest code.

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #421

behlendorf · 2014-12-19T19:02:43Z

This issue which is a duplicate of #2523 was resolved by the following commit. Full details can be found in the commit message and related lwn article.

openzfs/spl@a3c1eb7 mutex: force serialization on mutex_exit() to fix races

Commit: openzfs/zfs@a3c1eb7 From: Chunwei Chen <tuxoko@gmail.com> Date: Fri, 19 Dec 2014 11:31:59 +0800 Subject: mutex: force serialization on mutex_exit() to fix races It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Backported-by: Darik Horn <dajhorn@vanadac.com> Closes #421 Conflicts: include/sys/mutex.h

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes #421

ryao · 2015-02-10T18:43:15Z

@behlendorf This issue is not a duplicate of #2523. Instead, it is a duplicate of #3091, provided that it occurred on a multi-socket system.

lintonv · 2015-02-11T14:08:24Z

@ryao

We have been unable to reproduce our crash after applying the patch in #2523. We do need to test more (as in deploy it on more systems), but as of now the fix seems to work.

It is known that mutexes in Linux are not safe when using them to synchronize the freeing of object in which the mutex is embedded: http://lwn.net/Articles/575477/ The known places in ZFS which are suspected to suffer from the race condition are zio->io_lock and dbuf->db_mtx. * zio uses zio->io_lock and zio->io_cv to synchronize freeing between zio_wait() and zio_done(). * dbuf uses dbuf->db_mtx to protect reference counting. This patch fixes this kind of race by forcing serialization on mutex_exit() with a spin lock, making the mutex safe by sacrificing a bit of performance and memory overhead. This issue most commonly manifests itself as a deadlock in the zio pipeline caused by a process spinning on the damaged mutex. Similar deadlocks have been reported for the dbuf->db_mtx mutex. And it can also cause a NULL dereference or bad paging request under the right circumstances. This issue any many like it are linked off the openzfs/zfs#2523 issue. Specifically this fix resolves at least the following outstanding issues: openzfs/zfs#401 openzfs/zfs#2523 openzfs/zfs#2679 openzfs/zfs#2684 openzfs/zfs#2704 openzfs/zfs#2708 openzfs/zfs#2517 openzfs/zfs#2827 openzfs/zfs#2850 openzfs/zfs#2891 openzfs/zfs#2897 openzfs/zfs#2247 openzfs/zfs#2939 Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Richard Yao <ryao@gentoo.org> Closes openzfs#421 Conflicts: include/sys/mutex.h

behlendorf added Bug - Major labels Nov 13, 2014

behlendorf closed this as completed Dec 19, 2014

behlendorf added this to the 0.6.4 milestone Dec 19, 2014

ryao mentioned this issue Feb 10, 2015

ZFS lockup 0.6.3.1 #3091

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU lock up and ZFS crash #2891

CPU lock up and ZFS crash #2891

lintonv commented Nov 12, 2014

behlendorf commented Nov 13, 2014

lintonv commented Nov 13, 2014

behlendorf commented Nov 13, 2014

behlendorf commented Dec 19, 2014

ryao commented Feb 10, 2015

lintonv commented Feb 11, 2015

CPU lock up and ZFS crash #2891

CPU lock up and ZFS crash #2891

Comments

lintonv commented Nov 12, 2014

behlendorf commented Nov 13, 2014

lintonv commented Nov 13, 2014

behlendorf commented Nov 13, 2014

behlendorf commented Dec 19, 2014

ryao commented Feb 10, 2015

lintonv commented Feb 11, 2015