-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs sync hang #7038
Comments
The same situation with me too. |
Same issue. No reproduction steps identified, seems "random". Most commonly appears on long-running disk intensive tasks (in my case caused by BURP backup software) but getting the same blocked stack traces at generic_write_sync and kworker at 100%. Here is output from 'echo l > /proc/sysrq-trigger'
and another shortly after
|
same problem (hp-dataprotector backup software; vbda process). While run full backup on a snapshot zfs receive does not go on . attach file output of perf record -a and perf sched . i'm using deadline scheduler on block device to try to workaround this problem. some problem with noop scheduler.
|
the problem occurs very often. I attach other data |
Hi all, I experienced this problem as well (multiple times). Then I found this thread and tried switching from noop scheduler to deadline scheduler and still experienced the problem. I am running a 4.16.8 Gentoo kernel, with musl libc, and using fairly recent ZFS from master branch (zfs commit 670d74b , spl commit 1149b62). This tends to happen after doing a lot of compilation. I merged about 1500 packages with the deadline scheduler before having this hit. The noop scheduler would do about 500-1500 packages before quitting. My data set is probably too small to do any comparison, and I didn't bother writing down specifics (the workflow can be complicated to begin with...). Here's my dmesg just after the hang: Note that the system has 56 threads to play with and would spend some fraction of its time with a load average around 60; perhaps that is plenty of opportunity for resource contention and deadlocks? HTH. |
I am also experiencing this issue. I don't have as many helpful logs or diagnostics, but I can also say I am similarly doing a lot of heavy compilation and small IO. I'm also using it as a Docker storage driver, but the lockups will happen with or without Docker. I'm on an older 0.6.5.6-0ubuntu18. A reboot temporarily fixes the issue. |
another time (4.4.0-116-generic #140-Ubuntu zfs 0.6.5.6-0ubuntu16) While run full backup |
hit similar issue, happened on Gentoo during
NMI backtrace |
As a troubleshooting step, I recommend pulling the drives and doing some read/write sector tests, make sure there are no controller or bad sector issues. I've hit this bug on a system where the drive seemed fine, except for the odd ZFS hang until the drive became completely unresponsive a few months later and failed. |
Just chiming in to say 'me too'. Definitely seems to be a bug, disks are healthy. |
another time sync and other processes go lock during zfs receive and full backup (4.4.0-135-generic #161-Ubuntu) |
Have this issue too. Not very big IO. If I use Docker and compile something inside Docker box in ~12 hours system hangs (X11 freezes, mouse cursor is not moves). HDD LED blinks few times immediately after it and no more. Hardware reset fixes this issue but sometimes scrub finds errors and three times I lost my pool (import fails with different kernel stack traces) Also, may be it is important, I have root on this ZFS pool. Tried to change my hardware to AMD desktop and old Intel server without any result. So I think this is software issue. I am on Debian:
|
another time sync and other processes go lock during zfs receive and full backup |
it happened another time. Please help me! |
@bud4 currently most developer already got their hand full, so if you wanna they take on this problem first, so they can easily reproduce this problem on their end with some automation test environment. and it also preferable you take a spare machine upgrade to 0.7.12 and still able to reproduce this issue. |
@AndCycle This issue may not appear for 2 months and then begin to appear twice per day. Perhaps we need to turn on some additional logging and read it throught rs232? Tell it. Just in simple words, we are not all developers here and do not know how to hunt this bug. |
@AndCycle |
hopeless |
I also have this issue, and I am running zfs on root. I also run several dockers. When the issue occurs, I can always trace it back to a period of high I/O. The type of disk activity doesn't seem to matter because sometimes it will happen during a routine scrub, before which the disks had been fairly idle for hours. Not all periods of high I/O cause the problem though. Seemingly random problems like this, to me, point to a hardware issue (likely disk or storage controller), but badblocks and SMART tests have been coming up clean, so I'm not sure what to think at this point. I'll keep an eye on this and try to help out with logs or test data where I can. |
Same issue for us on multiple servers with high frequency create/destroy/clone. Using Debian 9.7 and ZOL 0.7.12 Here are some logs:
Let me know if you need anything else |
Happens to me too on my Manjaro 4.19 desktop system with zol 0.7.12-11
Also the process is still stuck in D even waiting far longer:
My 32Gb memory is still only 40% full S.M.A.R.T data of all disks is fine and it seems the unmount is stuck at random disks each time. My pool also is fine:
My pool consists of 4 10Tb LUKS encrypted disks. And one m.2 NVME drive as cache. Here my iostat -mx output:
Shutdown log: shutdown log:
Smart data sdb/sdx_crypt6
Please this is so damn annoying. Any data I can collect to help get to the bottom of this? Why is the kernel not even able to reboot? |
Jonathon from the Manjaro Team had me check the i/o scheduler I did and it turned out, that while zfs did set the i/o scheduler of the dm cryptsetup devices to I now changed all of them to none as well and until now the problem did not reappear. It is far from long enough to be sure, but it is definitely already a notable time that passed and my hopes are up. So if you have this issue and you use LUKS devices, check that the io scheduler is set to none on both layers and if not change it. Update: can also be "noop" on your system |
Hmmm, me too! Is anybody here who NOT used LUKS but faced with this issue? @CySlider you use ZFS pool on LUKS or LUKS vdev on ZFS? |
Scheduler for HDDs is called "noop", not "none". Maybe this is important. (So, I changed it to "noop" and stay fingers crossed) |
On my system it is "none" but you are right it might also be called "noop". You can check the available modes with: |
@denizzzka I am NOT using LUKS, but I am using ZFS native encryption. The scheduler for all (12) devices is set to |
Knock on wood, we have not had an hang or corruption issue since moving our
backup servers to Ubuntu Eoan.
|
I've just had this with 0.8.2 on 5.2.16 kernel. I was running a world update on Gentoo when the process stopped doing anything and |
It seems to me that this is somehow related to the swap. When it is become heavy used, but not all disks have the same IO speed, this happens. |
I don't have swap enabled. |
I would suggest that you provide scripts to use in these cases. so that it is possible to gather all the information you need. |
Does anyone suffering this issue make use of FUSE or AppImages mounted via FUSE? In #9430 we've found two of us have that in common. |
Additionally, what processors are you all using? In #9430 we also found us both using AMD processors to be in common. |
I'm not using FUSE or AppImages (unless Ubuntu server uses FUSE by default for something). I've seen this on EPYC 7601 FWIW. |
No to both.
System has two processors, both are Intel Xeon E5-2660 @ 2.20GHz. |
Ok, but maybe you (same as me) use pool with different HDDs where not all disks have the same IO speed? I use one slow "green" WD drive and 2 another more faster HDDs. If I switch slow partition to "offline" this hangs will become less frequent. |
At least my HDDs are all the same, and some people experienced this will all NVMe setup as well. |
On one of my pools the SSDs being used for ZIL/metadata caching are different drives but the sync problem is happens on my main rpool zpool which has identical NVMe drives in. |
We are using Dell R720 with 2x Xeon E5-2670 and 512GB Ram. The HBA is an LSI 9207 with 4 KINGSTON SEDC500M960G in stripped mirror configuration. No L2ARC, Slog and no swap either. The problem happens when rolling back LXC container hosting Postgres DBs which use their own dataset for their disk. zfs send is also used often on the ZFS dataset that is being rollbacked. Could it be a race condition ? ZFS 0.8.2 |
@kpande It's the same issue as in this thread. I posted my specs to maybe help pinpoint some similarities in the hardware between everyone experiencing this issue. |
I had the same issue but when I disabled cache and log and we set synchronization to standard after long debugging the problem went away. (cache and log were both SSD drives) |
I moved swap into another ssd drive and problem away |
once again in production. the situation is truly critical. zfs version 0.8.1-1ubuntu14.4 sync stack : Proces in state D dmes [ 243.613937] INFO: task zpool:6002 blocked for more than 120 seconds. update 👍 |
There are three locks at play, when handling ZIL records. First is a zl_issuer_lock, that is supposed to be held, whenever we're issuing a new record to ZIL. Then there is zl_lock to actually protect the in the zilog struct. And then there's a last lock, zcw_lock, which is used to protect the lists of ZIL call-back waiters. The order of locking is supposed to be: zl_issuer_lock -> zcw_lock -> zl_lock; as implied their usage in zil_alloc_lwb and zil_commit_waiter_timeout functions. Function zil_commit_waiter_link_lwb goes against this, it is expecting to be entered with zl_lock already held and only then it is taking the zcw_lock; This patch straightens the locking in zil_commit_waiter_link_lwb to take the zl_lock on its own, correcting the locking order. Following is an attached messsage from the Linux lockdep mechanism describing the potential deadlock - in our production, it was a matter of a few hours to hit this one... (redacted to fit 72 cols) systemd-journal/5561 is trying to acquire lock: (&zilog->zl_lock){+.+.}-{4:4}, at: zil_alloc_lwb+0x1df/0x3e0 but task is already holding lock: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&zcw->zcw_lock){+.+.}-{4:4}: __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_commit_waiter_link_lwb+0x51/0x1a0 [zfs] zil_commit_impl+0x12b0/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&zilog->zl_lock){+.+.}-{4:4}: __lock_acquire+0x12b6/0x2480 lock_acquire+0xab/0x380 __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_alloc_lwb+0x1df/0x3e0 [zfs] zil_lwb_write_issue+0x265/0x3f0 [zfs] zil_commit_impl+0x577/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&zcw->zcw_lock); lock(&zilog->zl_lock); lock(&zcw->zcw_lock); lock(&zilog->zl_lock); *** DEADLOCK *** 2 locks held by systemd-journal/5561: #0: (&zilog->zl_issuer_lock){+.+.}-{4:4}, at: zil_commit_impl+0x4dc... #1: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 Fixes openzfs#7038 Fixes openzfs#10440 Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
There are three locks at play, when handling ZIL records. First is a zl_issuer_lock, that is supposed to be held, whenever we're issuing a new record to ZIL. Then there is zl_lock to actually protect the in the zilog struct. And then there's a last lock, zcw_lock, which is used to protect the lists of ZIL call-back waiters. The order of locking is supposed to be: zl_issuer_lock -> zcw_lock -> zl_lock; as implied their usage in zil_alloc_lwb and zil_commit_waiter_timeout functions. Function zil_commit_waiter_link_lwb goes against this, it is expecting to be entered with zl_lock already held and only then it is taking the zcw_lock; This patch straightens the locking in zil_commit_waiter_link_lwb to take the zl_lock on its own, correcting the locking order. Following is an attached messsage from the Linux lockdep mechanism describing the potential deadlock - in our production, it was a matter of a few hours to hit this one... (redacted to fit 72 cols) systemd-journal/5561 is trying to acquire lock: (&zilog->zl_lock){+.+.}-{4:4}, at: zil_alloc_lwb+0x1df/0x3e0 but task is already holding lock: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&zcw->zcw_lock){+.+.}-{4:4}: __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_commit_waiter_link_lwb+0x51/0x1a0 [zfs] zil_commit_impl+0x12b0/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&zilog->zl_lock){+.+.}-{4:4}: __lock_acquire+0x12b6/0x2480 lock_acquire+0xab/0x380 __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_alloc_lwb+0x1df/0x3e0 [zfs] zil_lwb_write_issue+0x265/0x3f0 [zfs] zil_commit_impl+0x577/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&zcw->zcw_lock); lock(&zilog->zl_lock); lock(&zcw->zcw_lock); lock(&zilog->zl_lock); *** DEADLOCK *** 2 locks held by systemd-journal/5561: #0: (&zilog->zl_issuer_lock){+.+.}-{4:4}, at: zil_commit_impl+0x4dc... #1: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 Fixes openzfs#7038 Fixes openzfs#10440 Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
There are three locks at play, when handling ZIL records. First is a zl_issuer_lock, that is supposed to be held, whenever we're issuing a new record to ZIL. Then there is zl_lock to actually protect the in the zilog struct. And then there's a last lock, zcw_lock, which is used to protect the lists of ZIL call-back waiters. The order of locking is supposed to be: zl_issuer_lock -> zcw_lock -> zl_lock; as implied their usage in zil_alloc_lwb and zil_commit_waiter_timeout functions. Function zil_commit_waiter_link_lwb goes against this, it is expecting to be entered with zl_lock already held and only then it is taking the zcw_lock; This patch straightens the locking in zil_commit_waiter_link_lwb to take the zl_lock on its own, correcting the locking order. Following is an attached messsage from the Linux lockdep mechanism describing the potential deadlock - in our production, it was a matter of a few hours to hit this one... (redacted to fit 72 cols) systemd-journal/5561 is trying to acquire lock: (&zilog->zl_lock){+.+.}-{4:4}, at: zil_alloc_lwb+0x1df/0x3e0 but task is already holding lock: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&zcw->zcw_lock){+.+.}-{4:4}: __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_commit_waiter_link_lwb+0x51/0x1a0 [zfs] zil_commit_impl+0x12b0/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&zilog->zl_lock){+.+.}-{4:4}: __lock_acquire+0x12b6/0x2480 lock_acquire+0xab/0x380 __mutex_lock+0xac/0x9e0 mutex_lock_nested+0x1b/0x20 zil_alloc_lwb+0x1df/0x3e0 [zfs] zil_lwb_write_issue+0x265/0x3f0 [zfs] zil_commit_impl+0x577/0x1850 [zfs] zil_commit+0x43/0x60 [zfs] zpl_writepages+0xf8/0x1a0 [zfs] do_writepages+0x43/0xf0 __filemap_fdatawrite_range+0xd5/0x110 filemap_write_and_wait_range+0x4b/0xb0 zpl_fsync+0x4d/0xb0 [zfs] vfs_fsync_range+0x49/0x80 do_fsync+0x3d/0x70 __x64_sys_fsync+0x14/0x20 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&zcw->zcw_lock); lock(&zilog->zl_lock); lock(&zcw->zcw_lock); lock(&zilog->zl_lock); *** DEADLOCK *** 2 locks held by systemd-journal/5561: #0: (&zilog->zl_issuer_lock){+.+.}-{4:4}, at: zil_commit_impl+0x4dc... #1: (&zcw->zcw_lock){+.+.}-{4:4}, at: zil_commit_impl+0x52b/0x1850 Fixes openzfs#7038 Fixes openzfs#10440 Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
CONGRATULATIONS! You fixed THE BUG. I was able to confirm that the bug was exterminated on my relevant system as of (or before) commit 35ac0ed on 2021-01-26. I did this by reproducing the kworker 100% hang just before updating ZFS. Then I attempted to reproduce it after updating ZFS, and was unable to. The affected system had 56 threads and could easily trigger this bug, assuming one has an hour or so to wait around for it. So the certainty is quite high. For methodology, here's a little story, with screenshots: That should provide some (hopefully replicable) experimental evidence of this bug's demise, which might give the internet a bit more closure. More importantly, I wanted to thank you for doing this. Thank you! I am grateful and ricing at full power with no fear :) |
Hurrah! Wonder when it'll end up in a stable release. I noticed portage was a good way to trigger it too. |
In case it helps, I just updated my ebuild repo, so anyone (using Gentoo) who wants the working version I am using can skip some fidgety ebuild-writing steps by using these: Given that the other contents of the repo probably aren't of interest to most people, you'd probably be best served by just copy-pasting those folders or ebuilds into your own ZFS has continuous integration and the devs seem to be very thoughtful in their work, so I feel pretty safe just using any commit that passes most-or-all CI tests, though I do read some of the preceding commits just to spot-check for anything alarming. So far I haven't seen any commits that I wouldn't want. IMO totally worth it, especially when the alternative involves enduring a bug like this one. Of course, when working for a business, I could understand still wanting to wait for an official release due to political/CYA reasons; but if there's some bug like this that actively impedes resource utilization or productivity, then one might be able to negotiate for a practical exception to any of those pesky bureaucratic rules and expectations :3 |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
Closing. Fixed in 2.0 and later releases. |
System information
Describe the problem you're observing
sync hang ,mysql hang,kworker cpu 100
Describe how to reproduce the problem
high frequency create/destroy/clone
Include any warning/errors/backtraces from the system logs
dmsg:
[ 189.990968] Adjusting tsc more than 11% (8039035 vs 7759471)
[ 2522.644734] INFO: task mysqld:6608 blocked for more than 120 seconds.
[ 2522.644790] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2522.644854] mysqld D 0000000000000000 0 6608 5602 0x00000000
[ 2522.644860] ffff880feb913da0 0000000000000086 ffff881006eac500 ffff880feb913fd8
[ 2522.644865] ffff880feb913fd8 ffff880feb913fd8 ffff881006eac500 ffff880fffc8a180
[ 2522.644869] ffff880fffc8a000 ffff880fffc8a188 ffff880fffc8a028 0000000000000000
[ 2522.644873] Call Trace:
[ 2522.644882] [] schedule+0x29/0x70
[ 2522.644906] [] cv_wait_common+0x125/0x150 [spl]
[ 2522.644911] [] ? wake_up_atomic_t+0x30/0x30
[ 2522.644922] [] __cv_wait+0x15/0x20 [spl]
[ 2522.644995] [] zil_commit.part.12+0x8b/0x830 [zfs]
[ 2522.645006] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2522.645017] [] ? tsd_set+0x324/0x500 [spl]
[ 2522.645022] [] ? mutex_lock+0x12/0x2f
[ 2522.645075] [] zil_commit+0x17/0x20 [zfs]
[ 2522.645128] [] zfs_fsync+0x77/0xf0 [zfs]
[ 2522.645179] [] zpl_fsync+0x65/0x90 [zfs]
[ 2522.645186] [] do_fsync+0x65/0xa0
[ 2522.645191] [] SyS_fsync+0x10/0x20
[ 2522.645196] [] system_call_fastpath+0x16/0x1b
[ 2522.645202] INFO: task mysqld:7514 blocked for more than 120 seconds.
[ 2522.645245] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2522.645296] mysqld D ffff880fb554bda0 0 7514 5602 0x00000000
[ 2522.645299] ffff880fb554bd30 0000000000000086 ffff880ff4c8f300 ffff880fb554bfd8
[ 2522.645303] ffff880fb554bfd8 ffff880fb554bfd8 ffff880ff4c8f300 ffff880fb554bdb0
[ 2522.645307] ffff88107ff8dec0 0000000000000002 ffffffff811f9420 ffff880fb554bda0
[ 2522.645312] Call Trace:
[ 2522.645316] [] ? unlock_two_nondirectories+0x60/0x60
[ 2522.645321] [] schedule+0x29/0x70
[ 2522.645324] [] inode_wait+0xe/0x20
[ 2522.645328] [] __wait_on_bit+0x60/0x90
[ 2522.645335] [] __inode_wait_for_writeback+0xaf/0xf0
[ 2522.645339] [] ? wake_atomic_t_function+0x40/0x40
[ 2522.645345] [] inode_wait_for_writeback+0x26/0x40
[ 2522.645348] [] evict+0x95/0x170
[ 2522.645351] [] iput+0xf5/0x180
[ 2522.645357] [] do_unlinkat+0x1ae/0x2b0
[ 2522.645362] [] ? SyS_readlinkat+0xd1/0x140
[ 2522.645367] [] SyS_unlink+0x16/0x20
[ 2522.645371] [] system_call_fastpath+0x16/0x1b
[ 2522.645402] INFO: task sync:2653 blocked for more than 120 seconds.
[ 2522.645443] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2522.645494] sync D ffffffff8120f8c0 0 2653 1455 0x00000000
[ 2522.645498] ffff880ffedd7d50 0000000000000082 ffff880f01449700 ffff880ffedd7fd8
[ 2522.645502] ffff880ffedd7fd8 ffff880ffedd7fd8 ffff880f01449700 ffff880ffedd7e80
[ 2522.645506] ffff880ffedd7e88 7fffffffffffffff ffff880f01449700 ffffffff8120f8c0
[ 2522.645509] Call Trace:
[ 2522.645515] [] ? generic_write_sync+0x60/0x60
[ 2522.645519] [] schedule+0x29/0x70
[ 2522.645523] [] schedule_timeout+0x209/0x2d0
[ 2522.645527] [] ? __queue_work+0x136/0x320
[ 2522.645530] [] ? __queue_delayed_work+0xaa/0x1a0
[ 2522.645534] [] ? try_to_grab_pending+0xb1/0x160
[ 2522.645538] [] ? generic_write_sync+0x60/0x60
[ 2522.645543] [] wait_for_completion+0x116/0x170
[ 2522.645548] [] ? wake_up_state+0x20/0x20
[ 2522.645552] [] sync_inodes_sb+0xb7/0x1e0
[ 2522.645557] [] ? generic_write_sync+0x60/0x60
[ 2522.645561] [] sync_inodes_one_sb+0x19/0x20
[ 2522.645565] [] iterate_supers+0xb2/0x110
[ 2522.645570] [] sys_sync+0x44/0xb0
[ 2522.645575] [] system_call_fastpath+0x16/0x1b
[ 2642.739175] INFO: task mysqld:6608 blocked for more than 120 seconds.
[ 2642.739228] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2642.739284] mysqld D 0000000000000000 0 6608 5602 0x00000000
[ 2642.739290] ffff880feb913da0 0000000000000086 ffff881006eac500 ffff880feb913fd8
[ 2642.739296] ffff880feb913fd8 ffff880feb913fd8 ffff881006eac500 ffff880fffc8a180
[ 2642.739300] ffff880fffc8a000 ffff880fffc8a188 ffff880fffc8a028 0000000000000000
[ 2642.739305] Call Trace:
[ 2642.739315] [] schedule+0x29/0x70
[ 2642.739340] [] cv_wait_common+0x125/0x150 [spl]
[ 2642.739345] [] ? wake_up_atomic_t+0x30/0x30
[ 2642.739357] [] __cv_wait+0x15/0x20 [spl]
[ 2642.739434] [] zil_commit.part.12+0x8b/0x830 [zfs]
[ 2642.739447] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2642.739460] [] ? tsd_set+0x324/0x500 [spl]
[ 2642.739466] [] ? mutex_lock+0x12/0x2f
[ 2642.739526] [] zil_commit+0x17/0x20 [zfs]
[ 2642.739587] [] zfs_fsync+0x77/0xf0 [zfs]
[ 2642.739647] [] zpl_fsync+0x65/0x90 [zfs]
[ 2642.739654] [] do_fsync+0x65/0xa0
[ 2642.739659] [] SyS_fsync+0x10/0x20
[ 2642.739665] [] system_call_fastpath+0x16/0x1b
[ 2642.739670] INFO: task mysqld:7514 blocked for more than 120 seconds.
[ 2642.739727] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2642.739793] mysqld D ffff880fb554bda0 0 7514 5602 0x00000000
[ 2642.739798] ffff880fb554bd30 0000000000000086 ffff880ff4c8f300 ffff880fb554bfd8
[ 2642.739803] ffff880fb554bfd8 ffff880fb554bfd8 ffff880ff4c8f300 ffff880fb554bdb0
[ 2642.739808] ffff88107ff8dec0 0000000000000002 ffffffff811f9420 ffff880fb554bda0
[ 2642.739812] Call Trace:
[ 2642.739818] [] ? unlock_two_nondirectories+0x60/0x60
[ 2642.739823] [] schedule+0x29/0x70
[ 2642.739826] [] inode_wait+0xe/0x20
[ 2642.739831] [] __wait_on_bit+0x60/0x90
[ 2642.739839] [] __inode_wait_for_writeback+0xaf/0xf0
[ 2642.739843] [] ? wake_atomic_t_function+0x40/0x40
[ 2642.739850] [] inode_wait_for_writeback+0x26/0x40
[ 2642.739854] [] evict+0x95/0x170
[ 2642.739857] [] iput+0xf5/0x180
[ 2642.739862] [] do_unlinkat+0x1ae/0x2b0
[ 2642.739868] [] ? SyS_readlinkat+0xd1/0x140
[ 2642.739873] [] SyS_unlink+0x16/0x20
[ 2642.739878] [] system_call_fastpath+0x16/0x1b
[ 2642.739897] INFO: task sync:2653 blocked for more than 120 seconds.
[ 2642.739951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2642.740017] sync D ffffffff8120f8c0 0 2653 1455 0x00000000
[ 2642.740021] ffff880ffedd7d50 0000000000000082 ffff880f01449700 ffff880ffedd7fd8
[ 2642.740026] ffff880ffedd7fd8 ffff880ffedd7fd8 ffff880f01449700 ffff880ffedd7e80
[ 2642.740031] ffff880ffedd7e88 7fffffffffffffff ffff880f01449700 ffffffff8120f8c0
[ 2642.740036] Call Trace:
[ 2642.740042] [] ? generic_write_sync+0x60/0x60
[ 2642.740047] [] schedule+0x29/0x70
[ 2642.740051] [] schedule_timeout+0x209/0x2d0
[ 2642.740056] [] ? __queue_work+0x136/0x320
[ 2642.740060] [] ? __queue_delayed_work+0xaa/0x1a0
[ 2642.740064] [] ? try_to_grab_pending+0xb1/0x160
[ 2642.740069] [] ? generic_write_sync+0x60/0x60
[ 2642.740093] [] wait_for_completion+0x116/0x170
[ 2642.740100] [] ? wake_up_state+0x20/0x20
[ 2642.740105] [] sync_inodes_sb+0xb7/0x1e0
[ 2642.740110] [] ? generic_write_sync+0x60/0x60
[ 2642.740116] [] sync_inodes_one_sb+0x19/0x20
[ 2642.740121] [] iterate_supers+0xb2/0x110
[ 2642.740127] [] sys_sync+0x44/0xb0
[ 2642.740134] [] system_call_fastpath+0x16/0x1b
[ 2762.834495] INFO: task mysqld:6608 blocked for more than 120 seconds.
[ 2762.834547] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2762.834604] mysqld D 0000000000000000 0 6608 5602 0x00000000
[ 2762.834609] ffff880feb913da0 0000000000000086 ffff881006eac500 ffff880feb913fd8
[ 2762.834615] ffff880feb913fd8 ffff880feb913fd8 ffff881006eac500 ffff880fffc8a180
[ 2762.834619] ffff880fffc8a000 ffff880fffc8a188 ffff880fffc8a028 0000000000000000
[ 2762.834624] Call Trace:
[ 2762.834634] [] schedule+0x29/0x70
[ 2762.834659] [] cv_wait_common+0x125/0x150 [spl]
[ 2762.834665] [] ? wake_up_atomic_t+0x30/0x30
[ 2762.834677] [] __cv_wait+0x15/0x20 [spl]
[ 2762.834748] [] zil_commit.part.12+0x8b/0x830 [zfs]
[ 2762.834761] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2762.834774] [] ? tsd_set+0x324/0x500 [spl]
[ 2762.834779] [] ? mutex_lock+0x12/0x2f
[ 2762.834843] [] zil_commit+0x17/0x20 [zfs]
[ 2762.834908] [] zfs_fsync+0x77/0xf0 [zfs]
[ 2762.834971] [] zpl_fsync+0x65/0x90 [zfs]
[ 2762.834978] [] do_fsync+0x65/0xa0
[ 2762.834983] [] SyS_fsync+0x10/0x20
[ 2762.834989] [] system_call_fastpath+0x16/0x1b
[ 2762.834994] INFO: task mysqld:7514 blocked for more than 120 seconds.
[ 2762.835051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2762.835118] mysqld D ffff880fb554bda0 0 7514 5602 0x00000000
[ 2762.835122] ffff880fb554bd30 0000000000000086 ffff880ff4c8f300 ffff880fb554bfd8
[ 2762.835127] ffff880fb554bfd8 ffff880fb554bfd8 ffff880ff4c8f300 ffff880fb554bdb0
[ 2762.835132] ffff88107ff8dec0 0000000000000002 ffffffff811f9420 ffff880fb554bda0
[ 2762.835137] Call Trace:
[ 2762.835143] [] ? unlock_two_nondirectories+0x60/0x60
[ 2762.835148] [] schedule+0x29/0x70
[ 2762.835151] [] inode_wait+0xe/0x20
[ 2762.835156] [] __wait_on_bit+0x60/0x90
[ 2762.835164] [] __inode_wait_for_writeback+0xaf/0xf0
[ 2762.835168] [] ? wake_atomic_t_function+0x40/0x40
[ 2762.835175] [] inode_wait_for_writeback+0x26/0x40
[ 2762.835179] [] evict+0x95/0x170
[ 2762.835183] [] iput+0xf5/0x180
[ 2762.835188] [] do_unlinkat+0x1ae/0x2b0
[ 2762.835195] [] ? SyS_readlinkat+0xd1/0x140
[ 2762.835199] [] SyS_unlink+0x16/0x20
[ 2762.835205] [] system_call_fastpath+0x16/0x1b
[ 2762.835223] INFO: task sync:2653 blocked for more than 120 seconds.
[ 2762.835277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2762.835343] sync D ffffffff8120f8c0 0 2653 1455 0x00000000
[ 2762.835348] ffff880ffedd7d50 0000000000000082 ffff880f01449700 ffff880ffedd7fd8
[ 2762.835352] ffff880ffedd7fd8 ffff880ffedd7fd8 ffff880f01449700 ffff880ffedd7e80
[ 2762.835357] ffff880ffedd7e88 7fffffffffffffff ffff880f01449700 ffffffff8120f8c0
[ 2762.835362] Call Trace:
[ 2762.835369] [] ? generic_write_sync+0x60/0x60
[ 2762.835374] [] schedule+0x29/0x70
[ 2762.835378] [] schedule_timeout+0x209/0x2d0
[ 2762.835382] [] ? __queue_work+0x136/0x320
[ 2762.835386] [] ? __queue_delayed_work+0xaa/0x1a0
[ 2762.835390] [] ? try_to_grab_pending+0xb1/0x160
[ 2762.835396] [] ? generic_write_sync+0x60/0x60
[ 2762.835409] [] wait_for_completion+0x116/0x170
[ 2762.835417] [] ? wake_up_state+0x20/0x20
[ 2762.835422] [] sync_inodes_sb+0xb7/0x1e0
[ 2762.835427] [] ? generic_write_sync+0x60/0x60
[ 2762.835433] [] sync_inodes_one_sb+0x19/0x20
[ 2762.835438] [] iterate_supers+0xb2/0x110
[ 2762.835443] [] sys_sync+0x44/0xb0
[ 2762.835451] [] system_call_fastpath+0x16/0x1b
[ 2882.929813] INFO: task mysqld:6608 blocked for more than 120 seconds.
[ 2882.929861] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2882.929913] mysqld D 0000000000000000 0 6608 5602 0x00000000
[ 2882.929919] ffff880feb913da0 0000000000000086 ffff881006eac500 ffff880feb913fd8
[ 2882.929924] ffff880feb913fd8 ffff880feb913fd8 ffff881006eac500 ffff880fffc8a180
[ 2882.929928] ffff880fffc8a000 ffff880fffc8a188 ffff880fffc8a028 0000000000000000
[ 2882.929932] Call Trace:
[ 2882.929942] [] schedule+0x29/0x70
[ 2882.929967] [] cv_wait_common+0x125/0x150 [spl]
[ 2882.929972] [] ? wake_up_atomic_t+0x30/0x30
[ 2882.929983] [] __cv_wait+0x15/0x20 [spl]
[ 2882.930049] [] zil_commit.part.12+0x8b/0x830 [zfs]
[ 2882.930059] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[ 2882.930070] [] ? tsd_set+0x324/0x500 [spl]
[ 2882.930075] [] ? mutex_lock+0x12/0x2f
[ 2882.930129] [] zil_commit+0x17/0x20 [zfs]
[ 2882.930181] [] zfs_fsync+0x77/0xf0 [zfs]
[ 2882.930232] [] zpl_fsync+0x65/0x90 [zfs]
[ 2882.930238] [] do_fsync+0x65/0xa0
[ 2882.930243] [] SyS_fsync+0x10/0x20
[ 2882.930248] [] system_call_fastpath+0x16/0x1b
[ 4802.367346] perf interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
sync process stack:
[] sync_inodes_sb+0xb7/0x1e0
[] sync_inodes_one_sb+0x19/0x20
[] iterate_supers+0xb2/0x110
[] sys_sync+0x44/0xb0
[] system_call_fastpath+0x16/0x1b
[] 0xffffffffffffffff
kworker process stack:
cat /proc/3445/stack
[] dmu_zfetch+0x455/0x4f0 [zfs]
[] dbuf_read+0x8ea/0x9f0 [zfs]
[] dnode_hold_impl+0xc6/0xc30 [zfs]
[] 0xffffffffffffffff
cat /proc/3445/stack
[] 0xffffffffffffffff
[root@ifcos ~]# cat /proc/3445/stack
[] zfs_zget+0xfc/0x250 [zfs]
[] zfs_get_data+0x57/0x2d0 [zfs]
[] zil_commit.part.12+0x41c/0x830 [zfs]
[] zil_commit+0x17/0x20 [zfs]
[] zpl_writepages+0xd6/0x170 [zfs]
[] do_writepages+0x1e/0x40
[] __writeback_single_inode+0x40/0x220
[] writeback_sb_inodes+0x25e/0x420
[] wb_writeback+0xff/0x2f0
[] bdi_writeback_workfn+0x115/0x460
[] process_one_work+0x17b/0x470
[] worker_thread+0x11b/0x400
[] kthread+0xcf/0xe0
[] ret_from_fork+0x58/0x90
[] 0xffffffffffffffff
cat /proc/3445/stack
[] dmu_zfetch+0x455/0x4f0 [zfs]
[] __dbuf_hold_impl+0x135/0x5a0 [zfs]
[] 0xffffffffffffffff
cat /proc/3445/stack
[] dbuf_find+0x1c9/0x1d0 [zfs]
[] __dbuf_hold_impl+0x42/0x5a0 [zfs]
[] 0xffffffffffffffff
cat /proc/3445/stack
[] dmu_zfetch+0x455/0x4f0 [zfs]
[] dbuf_read+0x756/0x9f0 [zfs]
[] dnode_hold_impl+0xc6/0xc30 [zfs]
The text was updated successfully, but these errors were encountered: