Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU Load caused machine to stop. #2208

Closed
pozar opened this issue Mar 24, 2014 · 9 comments
Closed

High CPU Load caused machine to stop. #2208

pozar opened this issue Mar 24, 2014 · 9 comments

Comments

@pozar
Copy link

pozar commented Mar 24, 2014

I am running Ubunut 13.10 server with ..

Meta: 1
Name: zfs
Branch: 1.0
Version: 0.6.2
Release: 1~saucy
Release-Tags: relext

I suspect that with some high I/O load the array may have gotten into some strange state so that the bringing up the box again and when it tries to mount the pool it creates a high load (>35) and blocks ZFS processes...

[ 722.193597] INFO: task txg_sync:2388 blocked for more than 120 seconds.
[ 722.193610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 722.193623] txg_sync D ffff88043ed94580 0 2388 2 0x00000000
[ 722.193628] ffff8803d05f5bc8 0000000000000046 ffff8803d05f5fd8 0000000000014580
[ 722.193635] ffff8803d05f5fd8 0000000000014580 ffff880415adddc0 ffff88043ed94e28
[ 722.193641] ffff88040ceac9d8 ffff88040ceaca08 0000000000000001 0000000000000002
[ 722.193647] Call Trace:
[ 722.193653] [] io_schedule+0x9d/0x130
[ 722.193667] [] cv_wait_common+0x9d/0x1a0 [spl]
[ 722.193673] [] ? wake_up_atomic_t+0x30/0x30
[ 722.193686] [] __cv_wait_io+0x18/0x20 [spl]
[ 722.193733] [] zio_wait+0x103/0x1c0 [zfs]
[ 722.193778] [] dsl_scan_sync+0x466/0xa60 [zfs]
[ 722.193827] [] spa_sync+0x474/0xae0 [zfs]
[ 722.193835] [] ? ktime_get_ts+0x48/0xe0
[ 722.193886] [] txg_sync_thread+0x302/0x580 [zfs]
[ 722.193937] [] ? txg_quiesce_thread+0x380/0x380 [zfs]
[ 722.193950] [] thread_generic_wrapper+0x7a/0x90 [spl]
[ 722.193962] [] ? __thread_exit+0xa0/0xa0 [spl]
[ 722.193969] [] kthread+0xc0/0xd0
[ 722.193976] [] ? kthread_create_on_node+0x120/0x120
[ 722.193981] [] ret_from_fork+0x7c/0xb0
[ 722.193988] [] ? kthread_create_on_node+0x120/0x120
[ 722.193992] INFO: task spa_async:2464 blocked for more than 120 seconds.
[ 722.194005] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 722.194018] spa_async D ffff88043edd4580 0 2464 2 0x00000000
[ 722.194023] ffff8803ae85ddd0 0000000000000046 ffff8803ae85dfd8 0000000000014580
[ 722.194029] ffff8803ae85dfd8 0000000000014580 ffff880415fd2ee0 ffffffffa0610f20
[ 722.194035] ffffffffa0610f24 ffff880415fd2ee0 00000000ffffffff ffffffffa0610f28
[ 722.194041] Call Trace:
[ 722.194048] [] schedule_preempt_disabled+0x29/0x70
[ 722.194054] [] __mutex_lock_slowpath+0x13f/0x1c0
[ 722.194107] [] ? spa_vdev_resilver_done+0x140/0x140 [zfs]
[ 722.194114] [] mutex_lock+0x1f/0x2f
[ 722.194162] [] spa_async_thread+0x1fc/0x280 [zfs]
[ 722.194169] [] ? set_user_nice+0xd5/0x180
[ 722.194217] [] ? spa_vdev_resilver_done+0x140/0x140 [zfs]
[ 722.194230] [] thread_generic_wrapper+0x7a/0x90 [spl]
[ 722.194242] [] ? __thread_exit+0xa0/0xa0 [spl]
[ 722.194248] [] kthread+0xc0/0xd0
[ 722.194255] [] ? kthread_create_on_node+0x120/0x120
[ 722.194261] [] ret_from_fork+0x7c/0xb0
[ 722.194267] [] ? kthread_create_on_node+0x120/0x120
pozar@FS:~$

The work around was to bring up FreeBSD 10, do a force import on the array and export the array. Ubuntu/LoZ was able to import the array at that point with out the blocking.

Anything else you need to help on this?

Tim

@behlendorf behlendorf added this to the 0.6.4 milestone Mar 24, 2014
@behlendorf behlendorf added the Bug label Mar 24, 2014
@behlendorf
Copy link
Contributor

@pozar Thanks for filing this issue. We've seen this sporadically and believe it should be improved in the latest code from master. Once we get it tagged and you pull the update it would be great to know if your able to reproduce this.

@pozar
Copy link
Author

pozar commented Mar 30, 2014

Sure. Shoot me a note when ready. Thanks. I should say it took some months of banging on this system to get it into this state. Hard to reproduce at will.

@behlendorf
Copy link
Contributor

@pozar What kernel are you running? If it's 3.13 or newer this may have been caused by openzfs/spl#340.

@behlendorf behlendorf modified the milestones: 0.6.3, 0.6.4 Apr 7, 2014
@pozar
Copy link
Author

pozar commented Apr 7, 2014

@behlendorf I don't think it was newer. I re-installed 13.10 after fixing the pool with FreeBSD. The kernel I see on it now is ...

pozar@fs:~$ uname -a
Linux fs 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

@behlendorf
Copy link
Contributor

@pozar OK thanks, I just wanted to check. Then this is definitely not caused by the issue reference above.

@behlendorf
Copy link
Contributor

I believe this is part of a known class of deadlock which can potentially occur. It manifests itself as a hung task in zio_wait() and has been difficult to reproduce. However, since this is a long standing rare issue which isn't going to get a quick fix I'm removing it as an 0.6.3 blocker.

@behlendorf behlendorf modified the milestones: 0.6.4, 0.6.3 May 2, 2014
@pozar
Copy link
Author

pozar commented May 3, 2014

I can reproduce it on occasion. I found that if I delete a bunch of files / directories it can get it into this state. Just saw this last week and had to bring up FreeBSD 10 to do an import / export to get things in a state where I could bring it up in ZoL.

@behlendorf behlendorf removed this from the 0.6.4 milestone Oct 30, 2014
@behlendorf
Copy link
Contributor

This is believed to be resolved in master.

@pozar
Copy link
Author

pozar commented Jul 22, 2016

When did this get "fixed". A couple of months ago, I ran into the problem again. I am also wondering if this is related to say slow I/O. I have since purchased an LSI SAS/SATA controller with 8 lanes of SATA on it. Will install and try to replicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants