Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU lock up and ZFS crash #2891

Closed
lintonv opened this issue Nov 12, 2014 · 6 comments
Closed

CPU lock up and ZFS crash #2891

lintonv opened this issue Nov 12, 2014 · 6 comments
Milestone

Comments

@lintonv
Copy link

lintonv commented Nov 12, 2014

I am seeing the following cpu lockup and zfs crash after 4 hours of IO testing. I am running code from the GIT master as of September 27th. I did not see these before that. I was running the stable version of 0.6.3 (tar.gz) before that.

This is the backtrace :

Nov 12 21:44:31 ssn-0-5-38 kernel: BUG: soft lockup - CPU#4 stuck for 67s! [ggaoed:4467]
Nov 12 21:44:31 ssn-0-5-38 kernel: Modules linked in: 8021q garp stp llc zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) sp
l(U) zlib_deflate aoe(U) ipv6 i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000(U) sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: aoe]
Nov 12 21:44:31 ssn-0-5-38 kernel: CPU 4
Nov 12 21:44:31 ssn-0-5-38 kernel: Modules linked in: 8021q garp stp llc zfs(P)(U) zcommon(P)(U) znvpair(P)(U) zavl(P)(U) zunicode(P)(U) sp
l(U) zlib_deflate aoe(U) ipv6 i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000(U) sg ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: aoe]
Nov 12 21:44:31 ssn-0-5-38 kernel:
Nov 12 21:44:31 ssn-0-5-38 kernel: Pid: 4467, comm: ggaoed Tainted: P --------------- 2.6.32-279.el6.x86_64 #1 SeaMicro Sabre2
/Type2 - Board Product Name1
Nov 12 21:44:31 ssn-0-5-38 kernel: RIP: 0010:[] [] _spin_lock+0x21/0x30
Nov 12 21:44:31 ssn-0-5-38 kernel: RSP: 0018:ffff8808130a78b8 EFLAGS: 00000202
Nov 12 21:44:31 ssn-0-5-38 kernel: RAX: 0000000000000000 RBX: ffff8808130a78b8 RCX: ffff880812dae430
Nov 12 21:44:31 ssn-0-5-38 kernel: RDX: 0000000000000002 RSI: ffff880662568ec0 RDI: ffff8806c470d40c
Nov 12 21:44:31 ssn-0-5-38 kernel: RBP: ffffffff8100bc0e R08: ffff88069acb8e30 R09: 0000000000000001
Nov 12 21:44:31 ssn-0-5-38 kernel: R10: dead000000200200 R11: dead000000100100 R12: ffffffff814fed1e
Nov 12 21:44:31 ssn-0-5-38 kernel: R13: ffff8808130a7838 R14: 0000000000008030 R15: ffff8808130a7848
Nov 12 21:44:31 ssn-0-5-38 kernel: FS: 00007f27ea3c9700(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
Nov 12 21:44:31 ssn-0-5-38 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Nov 12 21:44:31 ssn-0-5-38 kernel: CR2: 00007f4367dfcdd0 CR3: 0000000817e6e000 CR4: 00000000000406e0
Nov 12 21:44:31 ssn-0-5-38 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 12 21:44:31 ssn-0-5-38 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 12 21:44:31 ssn-0-5-38 kernel: Process ggaoed (pid: 4467, threadinfo ffff8808130a6000, task ffff880813069500)
Nov 12 21:44:31 ssn-0-5-38 kernel: Stack:
Nov 12 21:44:31 ssn-0-5-38 kernel: ffff8808130a78d8 ffffffff814fea85 ffff8806c470d3b0 ffff88069acb8dc0
Nov 12 21:44:31 ssn-0-5-38 kernel: ffff8808130a78e8 ffffffff814fea5b ffff8808130a7968 ffffffffa04ac7ca
Nov 12 21:44:31 ssn-0-5-38 kernel: ffff8808130a7938 ffff880724d3cea0 ffff88069acb8e00 0000000112dae330
Nov 12 21:44:31 ssn-0-5-38 kernel: Call Trace:
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __mutex_unlock_slowpath+0x25/0x60
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? mutex_unlock+0x1b/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dbuf_dirty+0x43a/0x820 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_buf_will_dirty+0xaf/0x100 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_write_uio_dnode+0x70/0x140 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_write_uio_dbuf+0x46/0x60 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zfs_write+0xb95/0xcf0 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __switch_to+0x1ac/0x320
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? kmem_alloc_debug+0x13a/0x4c0 [spl]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zpl_aio_write+0xe2/0x130 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zpl_aio_write+0x0/0x130 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? do_sync_readv_writev+0xfb/0x140
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? autoremove_wake_function+0x0/0x40
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? ep_poll_callback+0xbb/0xf0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? security_file_permission+0x16/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? do_readv_writev+0xd6/0x1f0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __wake_up_locked_key+0x18/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? eventfd_write+0x193/0x1d0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? vfs_writev+0x46/0x60
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? sys_pwritev+0xa2/0xc0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? system_call_fastpath+0x16/0x1b
Nov 12 21:44:31 ssn-0-5-38 kernel: Code: 01 74 05 e8 e2 e3 d7 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8
10 39 c2 74 0e f3 90 0f b7 17 f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f
Nov 12 21:44:31 ssn-0-5-38 kernel: Call Trace:
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __mutex_unlock_slowpath+0x25/0x60
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? mutex_unlock+0x1b/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dbuf_dirty+0x43a/0x820 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_buf_will_dirty+0xaf/0x100 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_write_uio_dnode+0x70/0x140 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? dmu_write_uio_dbuf+0x46/0x60 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zfs_write+0xb95/0xcf0 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __switch_to+0x1ac/0x320
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? kmem_alloc_debug+0x13a/0x4c0 [spl]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zpl_aio_write+0xe2/0x130 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? zpl_aio_write+0x0/0x130 [zfs]
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? do_sync_readv_writev+0xfb/0x140
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? autoremove_wake_function+0x0/0x40
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? ep_poll_callback+0xbb/0xf0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? security_file_permission+0x16/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? do_readv_writev+0xd6/0x1f0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? __wake_up_locked_key+0x18/0x20
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? eventfd_write+0x193/0x1d0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? vfs_writev+0x46/0x60
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? sys_pwritev+0xa2/0xc0
Nov 12 21:44:31 ssn-0-5-38 kernel: [] ? system_call_fastpath+0x16/0x1b

This is the output of /proc/slabinfo :

zil_lwb_cache 0 0 200 19 1 : tunables 120 60 8 : slabdata 0 0 0
l2arc_buf_hdr_t 0 0 40 92 1 : tunables 120 60 8 : slabdata 0 0 0
arc_buf_t 779383 779630 72 53 1 : tunables 120 60 8 : slabdata 14710 14710 0
arc_buf_hdr_t 779375 779441 288 13 1 : tunables 54 27 8 : slabdata 59957 59957 0
dmu_buf_impl_t 779418 779506 280 14 1 : tunables 54 27 8 : slabdata 55679 55679 0
dnode_t 1311 1325 816 5 1 : tunables 54 27 8 : slabdata 265 265 0
sa_cache 191 240 80 48 1 : tunables 120 60 8 : slabdata 5 5 0
lz4_cache 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
zio_data_buf_16384 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
zio_buf_16384 258042 258042 16384 1 4 : tunables 8 4 0 : slabdata 258042 258042 0
zio_data_buf_14336 0 0 14336 1 4 : tunables 8 4 0 : slabdata 0 0 0
zio_buf_14336 0 0 14336 1 4 : tunables 8 4 0 : slabdata 0 0 0
zio_data_buf_12288 0 0 12288 1 4 : tunables 8 4 0 : slabdata 0 0 0
zio_buf_12288 0 0 12288 1 4 : tunables 8 4 0 : slabdata 0 0 0
zio_data_buf_10240 0 0 10240 1 4 : tunables 8 4 0 : slabdata 0 0 0
zio_buf_10240 0 0 10240 1 4 : tunables 8 4 0 : slabdata 0 0 0
zio_data_buf_8192 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
zio_buf_8192 7 7 8192 1 2 : tunables 8 4 0 : slabdata 7 7 0
zio_data_buf_7168 0 0 7168 1 2 : tunables 8 4 0 : slabdata 0 0 0
zio_buf_7168 0 0 7168 1 2 : tunables 8 4 0 : slabdata 0 0 0
zio_data_buf_6144 0 0 6144 1 2 : tunables 8 4 0 : slabdata 0 0 0
zio_buf_6144 0 0 6144 1 2 : tunables 8 4 0 : slabdata 0 0 0
zio_data_buf_5120 0 0 5120 1 2 : tunables 8 4 0 : slabdata 0 0 0
zio_buf_5120 4 4 5120 1 2 : tunables 8 4 0 : slabdata 4 4 0
zio_data_buf_4096 458962 458962 4096 1 1 : tunables 24 12 8 : slabdata 458962 458962 0
zio_buf_4096 62202 62202 4096 1 1 : tunables 24 12 8 : slabdata 62202 62202 0
zio_data_buf_3584 0 0 3584 1 1 : tunables 24 12 8 : slabdata 0 0 0
zio_buf_3584 0 0 3584 1 1 : tunables 24 12 8 : slabdata 0 0 0
zio_data_buf_3072 0 0 3072 2 2 : tunables 24 12 8 : slabdata 0 0 0
zio_buf_3072 0 0 3072 2 2 : tunables 24 12 8 : slabdata 0 0 0
zio_data_buf_2560 0 0 2560 3 2 : tunables 24 12 8 : slabdata 0 0 0
zio_buf_2560 0 0 2560 3 2 : tunables 24 12 8 : slabdata 0 0 0
zio_data_buf_2048 0 0 2048 2 1 : tunables 24 12 8 : slabdata 0 0 0
zio_buf_2048 82 94 2048 2 1 : tunables 24 12 8 : slabdata 47 47 0
zio_data_buf_1536 0 0 1536 5 2 : tunables 24 12 8 : slabdata 0 0 0
zio_buf_1536 65 115 1536 5 2 : tunables 24 12 8 : slabdata 23 23 0
zio_data_buf_1024 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0
zio_buf_1024 0 0 1024 4 1 : tunables 54 27 8 : slabdata 0 0 0
zio_data_buf_512 0 0 512 8 1 : tunables 54 27 8 : slabdata 0 0 0
zio_buf_512 732 1000 512 8 1 : tunables 54 27 8 : slabdata 125 125 0
zio_link_cache 208 6006 48 77 1 : tunables 120 60 8 : slabdata 78 78 0
zio_cache 202 343 1072 7 2 : tunables 24 12 8 : slabdata 49 49 0
ddt_entry_cache 0 0 448 8 1 : tunables 54 27 8 : slabdata 0 0 0
range_seg_cache 301740 305443 64 59 1 : tunables 120 60 8 : slabdata 5177 5177 0

@behlendorf
Copy link
Contributor

This is a duplicate of #2523. It's something we absolutely need to address before the next tag. Can you describe your test setup and what workload your using to reproduce this.

@lintonv
Copy link
Author

lintonv commented Nov 13, 2014

@behlendorf Thanks.

Here is the test environment.
128K blocks
Random IO
20 Threads
50 Volumes

Looking at the code, could this be caused by zpl_aio_write in (zpl_file.c). I don't quite understand the locking mechanism in that code. That could be the root cause.

@behlendorf
Copy link
Contributor

@lintonv This issue actually goes pretty far back and predates the AIO code. But for some reason it has become much more common is the latest code.

behlendorf pushed a commit to openzfs/spl that referenced this issue Dec 19, 2014
It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #421
@behlendorf
Copy link
Contributor

This issue which is a duplicate of #2523 was resolved by the following commit. Full details can be found in the commit message and related lwn article.

openzfs/spl@a3c1eb7 mutex: force serialization on mutex_exit() to fix races

@behlendorf behlendorf added this to the 0.6.4 milestone Dec 19, 2014
dajhorn added a commit to zfsonlinux/pkg-spl that referenced this issue Dec 20, 2014
Commit: openzfs/zfs@a3c1eb7
From: Chunwei Chen <tuxoko@gmail.com>
Date: Fri, 19 Dec 2014 11:31:59 +0800
Subject: mutex: force serialization on mutex_exit() to fix races

It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Backported-by: Darik Horn <dajhorn@vanadac.com>
Closes #421

Conflicts:
        include/sys/mutex.h
behlendorf pushed a commit to openzfs/spl that referenced this issue Dec 23, 2014
It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes #421
@ryao
Copy link
Contributor

ryao commented Feb 10, 2015

@behlendorf This issue is not a duplicate of #2523. Instead, it is a duplicate of #3091, provided that it occurred on a multi-socket system.

@lintonv
Copy link
Author

lintonv commented Feb 11, 2015

@ryao

We have been unable to reproduce our crash after applying the patch in #2523. We do need to test more (as in deploy it on more systems), but as of now the fix seems to work.

ryao pushed a commit to ryao/spl that referenced this issue Feb 19, 2015
It is known that mutexes in Linux are not safe when using them to
synchronize the freeing of object in which the mutex is embedded:

http://lwn.net/Articles/575477/

The known places in ZFS which are suspected to suffer from the race
condition are zio->io_lock and dbuf->db_mtx.

* zio uses zio->io_lock and zio->io_cv to synchronize freeing
  between zio_wait() and zio_done().
* dbuf uses dbuf->db_mtx to protect reference counting.

This patch fixes this kind of race by forcing serialization on
mutex_exit() with a spin lock, making the mutex safe by sacrificing
a bit of performance and memory overhead.

This issue most commonly manifests itself as a deadlock in the zio
pipeline caused by a process spinning on the damaged mutex.  Similar
deadlocks have been reported for the dbuf->db_mtx mutex.  And it can
also cause a NULL dereference or bad paging request under the right
circumstances.

This issue any many like it are linked off the openzfs/zfs#2523
issue.  Specifically this fix resolves at least the following
outstanding issues:

openzfs/zfs#401
openzfs/zfs#2523
openzfs/zfs#2679
openzfs/zfs#2684
openzfs/zfs#2704
openzfs/zfs#2708
openzfs/zfs#2517
openzfs/zfs#2827
openzfs/zfs#2850
openzfs/zfs#2891
openzfs/zfs#2897
openzfs/zfs#2247
openzfs/zfs#2939

Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Closes openzfs#421

Conflicts:
	include/sys/mutex.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants