High CPU Load caused machine to stop. #2208

pozar · 2014-03-24T16:26:07Z

I am running Ubunut 13.10 server with ..

Meta: 1
Name: zfs
Branch: 1.0
Version: 0.6.2
Release: 1~saucy
Release-Tags: relext

I suspect that with some high I/O load the array may have gotten into some strange state so that the bringing up the box again and when it tries to mount the pool it creates a high load (>35) and blocks ZFS processes...

[ 722.193597] INFO: task txg_sync:2388 blocked for more than 120 seconds.
[ 722.193610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 722.193623] txg_sync D ffff88043ed94580 0 2388 2 0x00000000
[ 722.193628] ffff8803d05f5bc8 0000000000000046 ffff8803d05f5fd8 0000000000014580
[ 722.193635] ffff8803d05f5fd8 0000000000014580 ffff880415adddc0 ffff88043ed94e28
[ 722.193641] ffff88040ceac9d8 ffff88040ceaca08 0000000000000001 0000000000000002
[ 722.193647] Call Trace:
[ 722.193653] [] io_schedule+0x9d/0x130
[ 722.193667] [] cv_wait_common+0x9d/0x1a0 [spl]
[ 722.193673] [] ? wake_up_atomic_t+0x30/0x30
[ 722.193686] [] __cv_wait_io+0x18/0x20 [spl]
[ 722.193733] [] zio_wait+0x103/0x1c0 [zfs]
[ 722.193778] [] dsl_scan_sync+0x466/0xa60 [zfs]
[ 722.193827] [] spa_sync+0x474/0xae0 [zfs]
[ 722.193835] [] ? ktime_get_ts+0x48/0xe0
[ 722.193886] [] txg_sync_thread+0x302/0x580 [zfs]
[ 722.193937] [] ? txg_quiesce_thread+0x380/0x380 [zfs]
[ 722.193950] [] thread_generic_wrapper+0x7a/0x90 [spl]
[ 722.193962] [] ? __thread_exit+0xa0/0xa0 [spl]
[ 722.193969] [] kthread+0xc0/0xd0
[ 722.193976] [] ? kthread_create_on_node+0x120/0x120
[ 722.193981] [] ret_from_fork+0x7c/0xb0
[ 722.193988] [] ? kthread_create_on_node+0x120/0x120
[ 722.193992] INFO: task spa_async:2464 blocked for more than 120 seconds.
[ 722.194005] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 722.194018] spa_async D ffff88043edd4580 0 2464 2 0x00000000
[ 722.194023] ffff8803ae85ddd0 0000000000000046 ffff8803ae85dfd8 0000000000014580
[ 722.194029] ffff8803ae85dfd8 0000000000014580 ffff880415fd2ee0 ffffffffa0610f20
[ 722.194035] ffffffffa0610f24 ffff880415fd2ee0 00000000ffffffff ffffffffa0610f28
[ 722.194041] Call Trace:
[ 722.194048] [] schedule_preempt_disabled+0x29/0x70
[ 722.194054] [] __mutex_lock_slowpath+0x13f/0x1c0
[ 722.194107] [] ? spa_vdev_resilver_done+0x140/0x140 [zfs]
[ 722.194114] [] mutex_lock+0x1f/0x2f
[ 722.194162] [] spa_async_thread+0x1fc/0x280 [zfs]
[ 722.194169] [] ? set_user_nice+0xd5/0x180
[ 722.194217] [] ? spa_vdev_resilver_done+0x140/0x140 [zfs]
[ 722.194230] [] thread_generic_wrapper+0x7a/0x90 [spl]
[ 722.194242] [] ? __thread_exit+0xa0/0xa0 [spl]
[ 722.194248] [] kthread+0xc0/0xd0
[ 722.194255] [] ? kthread_create_on_node+0x120/0x120
[ 722.194261] [] ret_from_fork+0x7c/0xb0
[ 722.194267] [] ? kthread_create_on_node+0x120/0x120
pozar@FS:~$

The work around was to bring up FreeBSD 10, do a force import on the array and export the array. Ubuntu/LoZ was able to import the array at that point with out the blocking.

Anything else you need to help on this?

Tim

behlendorf · 2014-03-24T19:37:28Z

@pozar Thanks for filing this issue. We've seen this sporadically and believe it should be improved in the latest code from master. Once we get it tagged and you pull the update it would be great to know if your able to reproduce this.

pozar · 2014-03-30T19:32:06Z

Sure. Shoot me a note when ready. Thanks. I should say it took some months of banging on this system to get it into this state. Hard to reproduce at will.

behlendorf · 2014-04-07T16:21:59Z

@pozar What kernel are you running? If it's 3.13 or newer this may have been caused by openzfs/spl#340.

pozar · 2014-04-07T16:29:01Z

@behlendorf I don't think it was newer. I re-installed 13.10 after fixing the pool with FreeBSD. The kernel I see on it now is ...

pozar@fs:~$ uname -a
Linux fs 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

behlendorf · 2014-04-14T20:50:31Z

@pozar OK thanks, I just wanted to check. Then this is definitely not caused by the issue reference above.

behlendorf · 2014-05-02T17:10:13Z

I believe this is part of a known class of deadlock which can potentially occur. It manifests itself as a hung task in zio_wait() and has been difficult to reproduce. However, since this is a long standing rare issue which isn't going to get a quick fix I'm removing it as an 0.6.3 blocker.

pozar · 2014-05-03T00:38:23Z

I can reproduce it on occasion. I found that if I delete a bunch of files / directories it can get it into this state. Just saw this last week and had to bring up FreeBSD 10 to do an import / export to get things in a state where I could bring it up in ZoL.

behlendorf · 2016-07-16T00:57:53Z

This is believed to be resolved in master.

pozar · 2016-07-22T04:30:46Z

When did this get "fixed". A couple of months ago, I ran into the problem again. I am also wondering if this is related to say slow I/O. I have since purchased an LSI SAS/SATA controller with 8 lanes of SATA on it. Will install and try to replicate.

behlendorf added this to the 0.6.4 milestone Mar 24, 2014

behlendorf added the Bug label Mar 24, 2014

behlendorf modified the milestones: 0.6.3, 0.6.4 Apr 7, 2014

albertyann mentioned this issue Apr 22, 2014

CPU lock During the ZFS+Zvol #2272

Closed

behlendorf modified the milestones: 0.6.4, 0.6.3 May 2, 2014

behlendorf removed this from the 0.6.4 milestone Oct 30, 2014

behlendorf added Bug - Major and removed Bug labels Oct 30, 2014

behlendorf closed this as completed Jul 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU Load caused machine to stop. #2208

High CPU Load caused machine to stop. #2208

pozar commented Mar 24, 2014

behlendorf commented Mar 24, 2014

pozar commented Mar 30, 2014

behlendorf commented Apr 7, 2014

pozar commented Apr 7, 2014

behlendorf commented Apr 14, 2014

behlendorf commented May 2, 2014

pozar commented May 3, 2014

behlendorf commented Jul 16, 2016

pozar commented Jul 22, 2016

High CPU Load caused machine to stop. #2208

High CPU Load caused machine to stop. #2208

Comments

pozar commented Mar 24, 2014

behlendorf commented Mar 24, 2014

pozar commented Mar 30, 2014

behlendorf commented Apr 7, 2014

pozar commented Apr 7, 2014

behlendorf commented Apr 14, 2014

behlendorf commented May 2, 2014

pozar commented May 3, 2014

behlendorf commented Jul 16, 2016

pozar commented Jul 22, 2016