-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU Load caused machine to stop. #2208
Comments
@pozar Thanks for filing this issue. We've seen this sporadically and believe it should be improved in the latest code from master. Once we get it tagged and you pull the update it would be great to know if your able to reproduce this. |
Sure. Shoot me a note when ready. Thanks. I should say it took some months of banging on this system to get it into this state. Hard to reproduce at will. |
@pozar What kernel are you running? If it's 3.13 or newer this may have been caused by openzfs/spl#340. |
@behlendorf I don't think it was newer. I re-installed 13.10 after fixing the pool with FreeBSD. The kernel I see on it now is ... pozar@fs:~$ uname -a |
@pozar OK thanks, I just wanted to check. Then this is definitely not caused by the issue reference above. |
I believe this is part of a known class of deadlock which can potentially occur. It manifests itself as a hung task in zio_wait() and has been difficult to reproduce. However, since this is a long standing rare issue which isn't going to get a quick fix I'm removing it as an 0.6.3 blocker. |
I can reproduce it on occasion. I found that if I delete a bunch of files / directories it can get it into this state. Just saw this last week and had to bring up FreeBSD 10 to do an import / export to get things in a state where I could bring it up in ZoL. |
This is believed to be resolved in master. |
When did this get "fixed". A couple of months ago, I ran into the problem again. I am also wondering if this is related to say slow I/O. I have since purchased an LSI SAS/SATA controller with 8 lanes of SATA on it. Will install and try to replicate. |
I am running Ubunut 13.10 server with ..
Meta: 1
Name: zfs
Branch: 1.0
Version: 0.6.2
Release: 1~saucy
Release-Tags: relext
I suspect that with some high I/O load the array may have gotten into some strange state so that the bringing up the box again and when it tries to mount the pool it creates a high load (>35) and blocks ZFS processes...
[ 722.193597] INFO: task txg_sync:2388 blocked for more than 120 seconds.
[ 722.193610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 722.193623] txg_sync D ffff88043ed94580 0 2388 2 0x00000000
[ 722.193628] ffff8803d05f5bc8 0000000000000046 ffff8803d05f5fd8 0000000000014580
[ 722.193635] ffff8803d05f5fd8 0000000000014580 ffff880415adddc0 ffff88043ed94e28
[ 722.193641] ffff88040ceac9d8 ffff88040ceaca08 0000000000000001 0000000000000002
[ 722.193647] Call Trace:
[ 722.193653] [] io_schedule+0x9d/0x130
[ 722.193667] [] cv_wait_common+0x9d/0x1a0 [spl]
[ 722.193673] [] ? wake_up_atomic_t+0x30/0x30
[ 722.193686] [] __cv_wait_io+0x18/0x20 [spl]
[ 722.193733] [] zio_wait+0x103/0x1c0 [zfs]
[ 722.193778] [] dsl_scan_sync+0x466/0xa60 [zfs]
[ 722.193827] [] spa_sync+0x474/0xae0 [zfs]
[ 722.193835] [] ? ktime_get_ts+0x48/0xe0
[ 722.193886] [] txg_sync_thread+0x302/0x580 [zfs]
[ 722.193937] [] ? txg_quiesce_thread+0x380/0x380 [zfs]
[ 722.193950] [] thread_generic_wrapper+0x7a/0x90 [spl]
[ 722.193962] [] ? __thread_exit+0xa0/0xa0 [spl]
[ 722.193969] [] kthread+0xc0/0xd0
[ 722.193976] [] ? kthread_create_on_node+0x120/0x120
[ 722.193981] [] ret_from_fork+0x7c/0xb0
[ 722.193988] [] ? kthread_create_on_node+0x120/0x120
[ 722.193992] INFO: task spa_async:2464 blocked for more than 120 seconds.
[ 722.194005] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 722.194018] spa_async D ffff88043edd4580 0 2464 2 0x00000000
[ 722.194023] ffff8803ae85ddd0 0000000000000046 ffff8803ae85dfd8 0000000000014580
[ 722.194029] ffff8803ae85dfd8 0000000000014580 ffff880415fd2ee0 ffffffffa0610f20
[ 722.194035] ffffffffa0610f24 ffff880415fd2ee0 00000000ffffffff ffffffffa0610f28
[ 722.194041] Call Trace:
[ 722.194048] [] schedule_preempt_disabled+0x29/0x70
[ 722.194054] [] __mutex_lock_slowpath+0x13f/0x1c0
[ 722.194107] [] ? spa_vdev_resilver_done+0x140/0x140 [zfs]
[ 722.194114] [] mutex_lock+0x1f/0x2f
[ 722.194162] [] spa_async_thread+0x1fc/0x280 [zfs]
[ 722.194169] [] ? set_user_nice+0xd5/0x180
[ 722.194217] [] ? spa_vdev_resilver_done+0x140/0x140 [zfs]
[ 722.194230] [] thread_generic_wrapper+0x7a/0x90 [spl]
[ 722.194242] [] ? __thread_exit+0xa0/0xa0 [spl]
[ 722.194248] [] kthread+0xc0/0xd0
[ 722.194255] [] ? kthread_create_on_node+0x120/0x120
[ 722.194261] [] ret_from_fork+0x7c/0xb0
[ 722.194267] [] ? kthread_create_on_node+0x120/0x120
pozar@FS:~$
The work around was to bring up FreeBSD 10, do a force import on the array and export the array. Ubuntu/LoZ was able to import the array at that point with out the blocking.
Anything else you need to help on this?
Tim
The text was updated successfully, but these errors were encountered: