-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zvol/kworker/jbd2 possible deadlock (hung tasks) #2221
Comments
Reading over issue 1909 led me to suspect that someone might ask about the Arch kernel's PREEMPT settings. My overall kernel config is 4,832 non-comment lines (!) so I'll paste a section here:
|
@cbiffle Something went wrong with locking. It is not clear to me what that might be. Was there any more dmesg output or were these all threads? Did you let the system sit for 4 minutes to ensure that all hung threads triggered the hangcheck timer? Are you able to reproduce this issue or was this a one time thing? |
@cbiffle When I looked at this two days ago, I was speed reading through reports to see if I could find issues linked to #2484. I had mistaken this for a deadlock. That being said, I think I see at least part of what happened here to cause the "hitch" that you described:
At the same time, #2484 probably would be able to help here, but that reminds to be seen. That being said, we should look into a possible contention issue. |
Closing as out of date. |
I'm running a newish install of Arch Linux (~2 weeks, specific versions below). I've configured a single raidz of five SATA disks; it is not my root filesystem, though I do store /var/log there. To handle a portion of my data that I want encrypted at rest, I have a zvol with a LUKS container, and an ext4 filesystem within that.
Read/write performance to the filesystem on the zvol is unimpressive but acceptable. But I do notice occasional "hitches" where all I/O seems to pause. Today in the kernel output I noticed a report of several hung kernel threads. The system does appear to have recovered -- I'm messing around in the ext4 filesystem right now.
I've found a thread on the mailing list that describes issues in a very similar situation, but that was four years ago. I've also seen issue 21 and issue 1985; 21 seems closer than 1985 to my circumstances, but it was closed and believed to be fixed some time ago.
I'm curious if there's a generally accepted workaround for these issues -- the mailing list thread linked above suggested limiting the ARC, which does seem to get rather large on my 16GiB machine.
Anyway. Data dump begins here.
System Configuration
Linux metis 3.13.6-1-ARCH #1 SMP PREEMPT Fri Mar 7 22:47:48 CET 2014 x86_64 GNU/Linux
zfs-0.6.2 and spl-0.6.2, both built from the AUR recipe.
ASRock C2750D4I motherboard (Intel Avoton)
Disks are spread across two Marvell SATA controllers -- two on an 88SE9172, the other three on an 88SE9230. There is a separate kernel bug affecting the 88SE9172 but I don't think it's related (it causes disks to vanish, which isn't happening here).
dmesg output
zpool status
Notice the checksum error on the fifth disk. It was preexisting -- the aforementioned Marvell bug caused the disk to go offline and I had to scrub to clean up after it.
zfs get all tank0
zfs get all tank0/enc0
This is the zvol.
Contents of /sys/module/zfs/parameters
Please let me know if there's anything I should try, or any other data I ought to provide. Thanks!
The text was updated successfully, but these errors were encountered: