Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zdb -dddd dies with kernel.c:301 assertion #2154

Closed
clefru opened this issue Mar 2, 2014 · 12 comments
Closed

zdb -dddd dies with kernel.c:301 assertion #2154

clefru opened this issue Mar 2, 2014 · 12 comments

Comments

@clefru
Copy link
Contributor

clefru commented Mar 2, 2014

Running the command suggest in ryao@8825c49
triggers an assertion in my debug build:

[root@alia clemens]# zdb -dddd z 1
zdb: ../../lib/libzpool/kernel.c:301: Assertion `pthread_mutex_destroy(&(mp)->m_lock) == 0 (0x10 == 0x0)' failed.
Aborted (core dumped)

More from gdb unfortunately without proper symbols for some reason.

Program received signal SIGABRT, Aborted.
0x00007ffff6eaf369 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff6eaf369 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff6eb0768 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff6ea8456 in __assert_fail_base () from /usr/lib/libc.so.6
#3  0x00007ffff6ea8502 in __assert_fail () from /usr/lib/libc.so.6
#4  0x00007ffff6ea857b in __assert () from /usr/lib/libc.so.6
#5  0x00007ffff76a7271 in mutex_destroy () from /usr/lib/libzpool.so.2
#6  0x00007ffff76b043d in ?? () from /usr/lib/libzpool.so.2
#7  0x00007ffff76ae53f in ?? () from /usr/lib/libzpool.so.2
#8  0x00007ffff76aea63 in ?? () from /usr/lib/libzpool.so.2
#9  0x00007ffff76b1fe6 in arc_flush () from /usr/lib/libzpool.so.2
#10 0x00007ffff76dfbca in dsl_pool_close () from /usr/lib/libzpool.so.2
#11 0x00007ffff76f8f32 in ?? () from /usr/lib/libzpool.so.2
#12 0x00007ffff76fc69c in ?? () from /usr/lib/libzpool.so.2
#13 0x00007ffff76fd422 in ?? () from /usr/lib/libzpool.so.2
#14 0x00007ffff76fd761 in ?? () from /usr/lib/libzpool.so.2
#15 0x0000000000404c80 in ?? ()
#16 0x00007ffff6e9bb05 in __libc_start_main () from /usr/lib/libc.so.6
#17 0x00000000004061dc in ?? ()

I am running:

local/zfs-dkms-therp-git 0.6.2_r162.g3566d5c-1 (archzfs)
local/spl-dkms-therp-git 0.6.2_r23.g4c99541-1 (archzfs)

on Linux alia 3.10.28-1.1-lts #1 SMP Fri Jan 31 16:04:37 UTC 2014 x86_64 GNU/Linux. I have not yet updated because I want to know whether I am affected.

(I am sorry for the absolutely crappy symbol resolution in the stack trace above. I could not get GDB to properly resolve the dynamic symbols in /usr/lib/libzpool.so.2, despite clear .dynsym support of GDB.)

@behlendorf
Copy link
Contributor

People have reported this assert sporadically but it's never been reproducible.

@clefru
Copy link
Contributor Author

clefru commented Mar 3, 2014

My issue is quite easily reproducible. I hit it every time. I guess that my pool is just special somehow.

@ryao
Copy link
Contributor

ryao commented Mar 4, 2014

@clefru Run echo 1 > /proc/sys/vm/drop_caches and try again. If it makes the problem go away, this will be fixed when @behlendorf merges #2150.

@dweeezil
Copy link
Contributor

dweeezil commented Mar 5, 2014

@clefru Do you get any output before the assert? What if you use zdb -A? Since it looks like one of the hash locks might be involved (based on taking a flyer at your stack trace), It might be worth trying the patch referred to above in #2162.

@behlendorf
Copy link
Contributor

This is likely a duplicate of #2027.

@clefru
Copy link
Contributor Author

clefru commented Mar 8, 2014

@ryao: Negative unfortunately. The problem persists.
@dweeezil: No output before the assert. zdb -A works.

I'll try a plain old reboot now.

@clefru
Copy link
Contributor Author

clefru commented Mar 8, 2014

Some more datapoints:

  • Reboot on that system above didn't change anything.
  • zdb -S is affected just as zdb -dddd. Most likely we have the same root cause as in ztest: pthread_mutex_destroy() fails with EBUSY #2027.
  • The pool had an L2ARC before, which is now removed. I am mentioning this because the patch that @dweeezil references was in the vicinity of l2hdrs.
  • The above might not be relevant at all because, a freshly created pool is always affected
    [root@alia ~]# dd if=/dev/zero of=/tmp/testpool bs=4M count=200
    [root@alia ~]# losetup /dev/loop0 /tmp/testpool; zpool create testpool /dev/loop0
    [root@alia ~]# zdb -dddd testpool 1
    zdb: ../../lib/libzpool/kernel.c:301: Assertion `pthread_mutex_destroy(&(mp)->m_lock) == 0 (0x10 == 0x0)' failed.
  • I am running a debug build, SPL/ZFS_DKMS_ENABLE_DEBUG=y.
  • On another archlinux system with a slightly different kernel/gcc version, the 0.6.2 release does not exhibit the problems (zdb -dddd/-S) on a fresh pool. Also upgrading to the latest git spl-0.6.2-r23.g4c99541, zfs-0.6.2_r205.ge744001, doesn't make the problem reproducible.

I'll work to get both systems to the same set of package versions, and see if the problems go away.

@clefru
Copy link
Contributor Author

clefru commented Mar 8, 2014

Now I have two systems with identical packages, one exhibiting the problem, one without. So, I assume whatever the problem is, it is fixed in the most recent git. I will just bit the bullet and upgrade without knowing whether my system suffers from ryao@8825c49

@clefru
Copy link
Contributor Author

clefru commented Apr 14, 2014

After updating to 4d8c78c and cleaning up DEGRADED devices, I still hit the Assertion.

[root@alia zfs-0.6.2]# ltrace zdb -dddd z 
__libc_start_main(0x404860, 3, 0x7fffe9ea59b8, 0x40cac0 <unfinished ...>
setrlimit64(7, 0x7fffe9ea5800, 0x7fffe9ea59d8, 0)                                                 = 0
dprintf_setup(0x7fffe9ea57cc, 0x7fffe9ea59b8, 0x7fffe9ea59d8, -1)                                 = 0
getenv("SPA_CONFIG_PATH")                                                                         = nil
getopt(3, 0x7fffe9ea59b8, "bcdhilmM:suCDRSAFLXevp:t:U:P")                                         = 100
getopt(3, 0x7fffe9ea59b8, "bcdhilmM:suCDRSAFLXevp:t:U:P")                                         = 100
getopt(3, 0x7fffe9ea59b8, "bcdhilmM:suCDRSAFLXevp:t:U:P")                                         = 100
getopt(3, 0x7fffe9ea59b8, "bcdhilmM:suCDRSAFLXevp:t:U:P")                                         = 100
getopt(3, 0x7fffe9ea59b8, "bcdhilmM:suCDRSAFLXevp:t:U:P")                                         = -1
kernel_init(1, 0x7fffe9ea59b8, -1, 0)                                                             = 0
libzfs_init(0x7f95e2168f08, 0x7f95e1ea7d40, 0x7f95e1bf22e8, 1)                                    = 0x1d0d060
nvlist_alloc(0x7fffe9ea57e8, 2, 0, 0)                                                             = 0
nvlist_add_uint64(0x1d0dea0, 0x40f1a2, -1, 0x7f95e19d4620)                                        = 0
nvlist_add_uint32(0x1d0dea0, 0x40f1b5, 2, 1)                                                      = 0
spa_open_rewind(0x7fffe9ea5dcc, 0x7fffe9ea57d8, 0x40f5f5, 0x1d0dea0zdb: ../../lib/libzpool/kernel.c:301: Assertion `pthread_mutex_destroy(&(mp)->m_lock) == 0 (0x10 == 0x0)' failed.
 <no return ...>
--- SIGABRT (Aborted) ---
+++ killed by SIGABRT +++

strace is here: http://derp.co.uk/1e5d4
zdb dies always reading the same disk segment. I still run a debug build.

@cwedgwood
Copy link
Contributor

@behlendorf I think we should merge this into #2027 and edit that to the effect of:

Assertion `pthread_mutex_destroy(&(mp)->m_lock) == 0 (0x10 == 0x0)' failed.

@behlendorf
Copy link
Contributor

Agreed, closing as duplicate of #2027.

@behlendorf behlendorf modified the milestone: 0.6.6 Nov 8, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@behlendorf @cwedgwood @clefru @ryao @dweeezil and others