-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NFSd unresposive #1790
Comments
The lockup happens when the arc size reaches it's target size. txg_sync is busy with 100% IO. After a while the situation might recover. Do you have a hint how this can be debugged? |
Could you dump all the stack from the system with sysrq-t? In particular, we need to see what the txg_sync process is doing. The above stacks suggest just a very highly contented region of a file, they're blocked waiting on a range lock. |
I've had similiar performance issues as well, I rolled back to 0.6.1 as it was more stable for me. |
In the meantime we switched back to 0.6.1. Everything is running fine so far. |
@jsunderm, @prometheanfire If it's convenient it would great if you could verify that this doesn't happen in master and that this is another duplicate of #1720. |
Fix confirmed for me |
Confirmed fixed in master. Closing. |
Hi
we experience frequent lockups of our nfs server (running on kernel RHEL6 2.6.32-358.18.1 with zfsonlinux 0.6.2). Client operations are timing out. The server dumps messages like the one posted below. During the lockups txg_sync seems busy. All file systems are still accessible locally on the server.
Oct 15 23:54:21 fs2 kernel: INFO: task nfsd:6108 blocked for more than 120 seconds.
Oct 15 23:54:21 fs2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 15 23:54:21 fs2 kernel: nfsd D 0000000000000000 0 6108 2 0x00000080
Oct 15 23:54:21 fs2 kernel: ffff880357cb9740 0000000000000046 ffff88035d224ae0 0000000000000000
Oct 15 23:54:21 fs2 kernel: ffff880357cb9740 ffffffffa01a203a ffff8803611d0c88 ffff880300000000
Oct 15 23:54:21 fs2 kernel: ffff88035d225098 ffff880357cb9fd8 000000000000fb88 ffff88035d225098
Oct 15 23:54:21 fs2 kernel: Call Trace:
Oct 15 23:54:21 fs2 kernel: [] ? kmem_alloc_debug+0x13a/0x4c0 [spl]
Oct 15 23:54:21 fs2 kernel: [] ? prepare_to_wait_exclusive+0x4e/0x80
Oct 15 23:54:21 fs2 kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 15 23:54:21 fs2 kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 15 23:54:21 fs2 kernel: [] ? avl_find+0x60/0xb0 [zavl]
Oct 15 23:54:21 fs2 kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 15 23:54:21 fs2 kernel: [] zfs_range_lock+0x2ac/0x5c0 [zfs]
Oct 15 23:54:21 fs2 kernel: [] ? mutex_lock+0x1e/0x50
Oct 15 23:54:21 fs2 kernel: [] zfs_write+0x5e7/0xca0 [zfs]
Oct 15 23:54:21 fs2 kernel: [] ? dnode_hold_impl+0x347/0x560 [zfs]
Oct 15 23:54:21 fs2 kernel: [] zpl_write_common+0x52/0x70 [zfs]
Oct 15 23:54:21 fs2 kernel: [] zpl_write+0x68/0xa0 [zfs]
Oct 15 23:54:21 fs2 kernel: [] ? zpl_write+0x0/0xa0 [zfs]
Oct 15 23:54:21 fs2 kernel: [] do_loop_readv_writev+0x59/0x90
Oct 15 23:54:21 fs2 kernel: [] do_readv_writev+0x1e6/0x1f0
Oct 15 23:54:21 fs2 kernel: [] ? tsd_exit+0x5f/0x2b0 [spl]
Oct 15 23:54:21 fs2 kernel: [] ? nfsd_setuser+0x126/0x2c0 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] ? mutex_lock+0x1e/0x50
Oct 15 23:54:21 fs2 kernel: [] ? zfs_open+0xa5/0x140 [zfs]
Oct 15 23:54:21 fs2 kernel: [] ? zpl_open+0x0/0x90 [zfs]
Oct 15 23:54:21 fs2 kernel: [] vfs_writev+0x46/0x60
Oct 15 23:54:21 fs2 kernel: [] nfsd_vfs_write+0x105/0x430 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] ? dentry_open+0x52/0xc0
Oct 15 23:54:21 fs2 kernel: [] ? nfsd_open+0x1db/0x230 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] nfsd_write+0xe7/0x100 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] nfsd3_proc_write+0xaf/0x140 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] nfsd_dispatch+0xfe/0x240 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] svc_process_common+0x344/0x640 [sunrpc]
Oct 15 23:54:21 fs2 kernel: [] ? default_wake_function+0x0/0x20
Oct 15 23:54:21 fs2 kernel: [] svc_process+0x110/0x160 [sunrpc]
Oct 15 23:54:21 fs2 kernel: [] nfsd+0xc2/0x160 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] ? nfsd+0x0/0x160 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] kthread+0x96/0xa0
Oct 15 23:54:21 fs2 kernel: [] child_rip+0xa/0x20
Oct 15 23:54:21 fs2 kernel: [] ? kthread+0x0/0xa0
Oct 15 23:54:21 fs2 kernel: [] ? child_rip+0x0/0x20
Oct 15 23:54:21 fs2 kernel: INFO: task nfsd:6113 blocked for more than 120 seconds.
Oct 15 23:54:21 fs2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 15 23:54:21 fs2 kernel: nfsd D 0000000000000007 0 6113 2 0x00000080
Oct 15 23:54:21 fs2 kernel: ffff8803579e3740 0000000000000046 ffff8803579e3708 ffff8803579e3704
Oct 15 23:54:21 fs2 kernel: ffff8803579e3740 ffff88037fc25000 ffff8800282b6700 000000000000b15b
Oct 15 23:54:21 fs2 kernel: ffff8803609ddab8 ffff8803579e3fd8 000000000000fb88 ffff8803609ddab8
Oct 15 23:54:21 fs2 kernel: Call Trace:
Oct 15 23:54:21 fs2 kernel: [] cv_wait_common+0x105/0x1c0 [spl]
Oct 15 23:54:21 fs2 kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 15 23:54:21 fs2 kernel: [] ? avl_find+0x60/0xb0 [zavl]
Oct 15 23:54:21 fs2 kernel: [] __cv_wait+0x15/0x20 [spl]
Oct 15 23:54:21 fs2 kernel: [] zfs_range_lock+0x2ac/0x5c0 [zfs]
Oct 15 23:54:21 fs2 kernel: [] ? zfs_inode_update+0x110/0x180 [zfs]
Oct 15 23:54:21 fs2 kernel: [] ? mutex_lock+0x1e/0x50
Oct 15 23:54:21 fs2 kernel: [] zfs_write+0x5e7/0xca0 [zfs]
Oct 15 23:54:21 fs2 kernel: [] ? dnode_hold_impl+0x347/0x560 [zfs]
Oct 15 23:54:21 fs2 kernel: [] zpl_write_common+0x52/0x70 [zfs]
Oct 15 23:54:21 fs2 kernel: [] zpl_write+0x68/0xa0 [zfs]
Oct 15 23:54:21 fs2 kernel: [] ? zpl_write+0x0/0xa0 [zfs]
Oct 15 23:54:21 fs2 kernel: [] do_loop_readv_writev+0x59/0x90
Oct 15 23:54:21 fs2 kernel: [] do_readv_writev+0x1e6/0x1f0
Oct 15 23:54:21 fs2 kernel: [] ? tsd_exit+0x5f/0x2b0 [spl]
Oct 15 23:54:21 fs2 kernel: [] ? nfsd_setuser+0x126/0x2c0 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] ? mutex_lock+0x1e/0x50
Oct 15 23:54:21 fs2 kernel: [] ? zfs_open+0xa5/0x140 [zfs]
Oct 15 23:54:21 fs2 kernel: [] ? zpl_open+0x0/0x90 [zfs]
Oct 15 23:54:21 fs2 kernel: [] vfs_writev+0x46/0x60
Oct 15 23:54:21 fs2 kernel: [] nfsd_vfs_write+0x105/0x430 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] ? dentry_open+0x52/0xc0
Oct 15 23:54:21 fs2 kernel: [] ? nfsd_open+0x1db/0x230 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] nfsd_write+0xe7/0x100 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] nfsd3_proc_write+0xaf/0x140 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] nfsd_dispatch+0xfe/0x240 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] svc_process_common+0x344/0x640 [sunrpc]
Oct 15 23:54:21 fs2 kernel: [] ? default_wake_function+0x0/0x20
Oct 15 23:54:21 fs2 kernel: [] svc_process+0x110/0x160 [sunrpc]
Oct 15 23:54:21 fs2 kernel: [] nfsd+0xc2/0x160 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] ? nfsd+0x0/0x160 [nfsd]
Oct 15 23:54:21 fs2 kernel: [] kthread+0x96/0xa0
Oct 15 23:54:21 fs2 kernel: [] child_rip+0xa/0x20
Oct 15 23:54:21 fs2 kernel: [] ? kthread+0x0/0xa0
Oct 15 23:54:21 fs2 kernel: [] ? child_rip+0x0/0x20
The text was updated successfully, but these errors were encountered: