Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck txg_sync process, and IO speed issues when ARC is full #3235

Closed
angstymeat opened this issue Mar 28, 2015 · 47 comments
Closed

Stuck txg_sync process, and IO speed issues when ARC is full #3235

angstymeat opened this issue Mar 28, 2015 · 47 comments
Milestone

Comments

@angstymeat
Copy link

I seem to be able to pretty reliably cause txg_sync to become stuck during an rsync backup.

I had been having ARC issues with high memory consumption, and bad IO speeds after the ARC fills, so I had been trying the latest git nightlies when they are released, and combinations of various patches like 3223,3225, 3216, and a few others.

While the latest nightlies have helped memory usage, nothing has particularly helped with the IO speed. However, when I don't have memory/speed issues I will eventually end up with txg_sync taking up 80%+ CPU and a stalled rsync process.

This last time, I'm just using the straight-up nightly without any additional patches. The system was stuck in txg_sync for 14 hours before I rebooted.

The backups represent a wide range of file types and sizes. The problems really start to happen when backing up our maildir directories with tons of small files. The ARC fills up and once it is full, performance drops like a stone. I've had to run the echo/drop_caches trick at 20 minute intervals to keep the speed up. Without it, my backups take up to 8 hours to run. With it, it's less than an hour.

I wish I could be more specific, but the speed issues started around the kmem rework. I wasn't seeing this issue under the release 0.6.3.

I'm running Fedora 20 with the 3.17.8-200.fc20.x86_64 kernel. The machine is a Dell R515 with 16GB of RAM. The machine is used for backups, so I've been using it to test git ZFS releases under heavy IO loads.

@snajpa
Copy link
Contributor

snajpa commented Mar 28, 2015

Can you please post /proc/spl/kstat/zfs/arcstats, slabtop -o, /proc/meminfo next time it happens?
I've noticed this behavior too, even after PR#3202 was merged, my kernel in that case was Ubuntu 14.04's 3.16. All of the memory in my case was in arc_meta_used, the system was so out of memory, that the IOPS have dropped somewhere to a floppy-disk level.

@angstymeat
Copy link
Author

Will do. I'm running it on a loop right now to see if I can get it to fail sooner.

I'm graphing system memory and the ARC memory usage. I see arc_meta_used max out at arc_meta_limit, and I haven't seen it exceed that since the recent commits. The system doesn't appear to be out of memory either.

I'll post more when it happens again. It seems to be every 2 or 3 days if I just let it run on its own.

@kernelOfTruth
Copy link
Contributor

CC: @kpande (#3160 (comment))

@kernelOfTruth
Copy link
Contributor

https://icesquare.com/wordpress/how-to-improve-zfs-performance/

# Decrease ZFS txg timeout value from 30 (default) to 5 seconds.  This
# should increase throughput and decrease the "bursty" stalls that
# happen during immense I/O with ZFS.
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007343.html
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007355.html
# default in FreeBSD since ZFS v28
vfs.zfs.txg.timeout="5"

yeah well - it doesn't seem to help, it's per default at 5
@angstymeat did you try tinkering with that setting ?

anything unusual besides that in top, swapping or arcstats ? - basically what @snajpa asked

what hardware is that (harddrives, processor) ? any encryption used ?

more things to consider:
#938 (comment) , #938 (comment)
<-- much of these settings don't seem to apply anymore ...

3.17.8 doesn't seem to have

mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed

but that's probably more related to lockups than sync/writeback stalling

you could give that a try, if you're not bound to a stock kernel or software

otherwise - like I wrote in #2272

Is your configuration and system software (kernel, rsync, etc.) all stock ?

if not is using a patched rsync (to lower memory impact) an (temporary) option ?
http://insights.oetiker.ch/linux/fadvise/
https://tobi.oetiker.ch/patches/

that would add the

--drop-cache

option to rsync

can you run a kernel with

mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed

http://patchwork.ozlabs.org/patch/433475/

is CONFIG_PARAVIRT_SPINLOCKS and paravirt support an option ?
it could help in some corner cases: #3091

@dweeezil
Copy link
Contributor

@angstymeat If any of the blocked processes are stuck in zfs_zinactive() or zfs_zget(), it's likely the current version of the WIP patch in #3225 (dweeezil/zfs@7045329) will help (assuming you're running the post-kmem rework code which it sounds like you are).

I'm trying to nail down one more rather hard to reproduce deadlock and also laying some groundwork for better integration with the kernel's lockdep facility. This patch is currently very fluid and is likely to be re-pushed at least one more time today.

@angstymeat
Copy link
Author

@snajpa Here's arcstats, slabtop, and meminfo...

5 1 0x01 86 4128 1398669428 80571061797084
name                            type data
hits                            4    139687084
misses                          4    5721149
demand_data_hits                4    2650631
demand_data_misses              4    19674
demand_metadata_hits            4    89007320
demand_metadata_misses          4    2639187
prefetch_data_hits              4    114617
prefetch_data_misses            4    1343495
prefetch_metadata_hits          4    47914516
prefetch_metadata_misses        4    1718793
mru_hits                        4    27939267
mru_ghost_hits                  4    426776
mfu_hits                        4    63718685
mfu_ghost_hits                  4    682354
deleted                         4    3893803
recycle_miss                    4    227757
mutex_miss                      4    1062
evict_skip                      4    73883228
evict_l2_cached                 4    0
evict_l2_eligible               4    208164622848
evict_l2_ineligible             4    14595452928
hash_elements                   4    795
hash_elements_max               4    582617
hash_collisions                 4    312709
hash_chains                     4    0
hash_chain_max                  4    6
p                               4    19590144
c                               4    19606528
c_min                           4    4194304
c_max                           4    8338591744
size                            4    19317824
hdr_size                        4    342032
data_size                       4    0
meta_size                       4    16603648
other_size                      4    2372144
anon_size                       4    9453568
anon_evict_data                 4    0
anon_evict_metadata             4    0
mru_size                        4    6230016
mru_evict_data                  4    0
mru_evict_metadata              4    1544192
mru_ghost_size                  4    81920
mru_ghost_evict_data            4    0
mru_ghost_evict_metadata        4    81920
mfu_size                        4    920064
mfu_evict_data                  4    0
mfu_evict_metadata              4    96768
mfu_ghost_size                  4    4096000
mfu_ghost_evict_data            4    0
mfu_ghost_evict_metadata        4    4096000
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_cdata_free_on_write          4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    95771
memory_indirect_count           4    0
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    7308
arc_meta_used                   4    19317824
arc_meta_limit                  4    6253943808
arc_meta_max                    4    6301875264
Active / Total Objects (% used)    : 451548 / 552635 (81.7%)
 Active / Total Slabs (% used)      : 11864 / 11864 (100.0%)
 Active / Total Caches (% used)     : 144 / 230 (62.6%)
 Active / Total Size (% used)       : 103096.18K / 130857.09K (78.8%)
 Minimum / Average / Maximum Object : 0.01K / 0.24K / 16.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
129792 126308  97%    0.06K   2028       64      8112K range_seg_cache
 50154  47951  95%    0.10K   1286       39      5144K buffer_head
 35520  13911  39%    0.06K    555       64      2220K kmalloc-64
 32096  32096 100%    0.12K    944       34      3776K kernfs_node_cache
 23552  23552 100%    0.01K     46      512       184K kmalloc-8
 18942  16778  88%    0.19K    451       42      3608K dentry
 17598   6311  35%    0.09K    419       42      1676K kmalloc-96
 15708   1041   6%    0.31K    308       51      4928K arc_buf_hdr_t
 15360  13717  89%    0.03K    120      128       480K kmalloc-32
 14841  14360  96%    0.08K    291       51      1164K selinux_inode_security
 13986   6771  48%    0.19K    333       42      2664K kmalloc-192
 13572   1907  14%    0.30K    261       52      4176K dmu_buf_impl_t
 11264   6065  53%    0.12K    352       32      1408K kmalloc-128
 10584   9982  94%    0.57K    189       56      6048K inode_cache
  8816   8704  98%    2.00K    551       16     17632K kmalloc-2048
  8496   1714  20%    0.88K    236       36      7552K dnode_t
  7488   1288  17%    0.50K    234       32      3744K zio_buf_512
  6996   6873  98%    0.18K    159       44      1272K vm_area_struct
  6912   6720  97%    0.02K     27      256       108K kmalloc-16
  6290   6290 100%    0.05K     74       85       296K zio_link_cache
  6200   6200 100%    0.63K    124       50      3968K proc_inode_cache
  5936   5936 100%    0.07K    106       56       424K anon_vma
  5888   5690  96%    0.06K     92       64       368K anon_vma_chain
  5632   5499  97%    0.25K    176       32      1408K skbuff_head_cache
  5100   4291  84%    0.04K     50      102       200K ext4_extent_status
  4746   4746 100%    0.19K    113       42       904K cred_jar
  4384   4094  93%    0.50K    137       32      2192K kmalloc-512
  3936   3777  95%    1.00K    123       32      3936K kmalloc-1024
  3920   3857  98%    0.57K     70       56      2240K radix_tree_node
  3400   3400 100%    0.05K     40       85       160K ftrace_event_field
  3159   1876  59%    0.10K     81       39       324K arc_buf_t
  2752   2199  79%    0.25K     86       32       688K filp
  2336   2196  94%    0.25K     73       32       584K kmalloc-256
  1920   1892  98%    0.98K     60       32      1920K ext4_inode_cache
  1824   1824 100%    0.66K     38       48      1216K shmem_inode_cache
  1736   1736 100%    0.14K     31       56       248K ext4_groupinfo_4k
  1512   1512 100%    0.07K     27       56       108K Acpi-Operand
  1376   1216  88%    0.25K     43       32       344K bio-0
  1364   1320  96%    0.36K     31       44       496K blkdev_requests
  1334   1334 100%    0.09K     29       46       116K ftrace_event_file
  1224   1224 100%    0.04K     12      102        48K Acpi-Namespace
  1152   1152 100%    0.06K     18       64        72K task_delay_info
  1116   1116 100%    0.11K     31       36       124K jbd2_journal_head
  1088    960  88%    0.25K     34       32       272K sgpool-8
  1008    924  91%    0.38K     24       42       384K scsi_cmd_cache
   986    841  85%    1.08K     34       29      1088K zio_cache
MemTotal:       16286312 kB
MemFree:        14879536 kB
MemAvailable:   14961812 kB
Buffers:          177136 kB
Cached:           119008 kB
SwapCached:            0 kB
Active:           298828 kB
Inactive:          34964 kB
Active(anon):      38032 kB
Inactive(anon):      456 kB
Active(file):     260796 kB
Inactive(file):    34508 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8216572 kB
SwapFree:        8216572 kB
Dirty:                 4 kB
Writeback:             0 kB
AnonPages:         37664 kB
Mapped:            57012 kB
Shmem:               836 kB
Slab:             131824 kB
SReclaimable:      24436 kB
SUnreclaim:       107388 kB
KernelStack:        8624 kB
PageTables:         5048 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    16359728 kB
Committed_AS:     214944 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      893232 kB
VmallocChunk:   34341556260 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       84580 kB
DirectMap2M:     3051520 kB
DirectMap1G:    13631488 kB

@kernelOfTruth, I haven't done anything with vfs.zfs.txg.timeout, although I tried /sys/module/zfs/parameters/zfs_arc_meta_prune without it seeming to have an effect. My kernel and everything is straight-up stock, except I did compile my own version of rsync from source since version 3.1.1 wasn't available in the repository.

@dweezil, I had been trying the 3225 patch for a couple of days. The last time I tried was on the 27th.

Now that I'm looking more closely, I'm seeing this:

  • txg_sync is ~ 80% CPU
  • z_null_iss is ~36%
  • txg_quiesce is ~25%

The system has 8 cores, so the CPU number's won't add up to 100.

I'm running the 3.17.8 kernel currently because of the .zfs directory issue that popped up in 3.18.

@angstymeat
Copy link
Author

I never let it run this long, but it looks like after 15 hours the txg_sync finished up and the rsync is continuing from where it appeared to be frozen at.

@angstymeat
Copy link
Author

Letting the server continue to run, it appears there's a pattern of "run, txg_sync for 15 hours, finish running".

@angstymeat
Copy link
Author

Ok, I'm running with the latest git copy after 3225 and 466 had been committed, and I'm not having the long txg_sycn.

However, I'm now seeing that arc_adapt is running continuously at 3.5% cpu since my backups ran.

@angstymeat
Copy link
Author

echoing to drop_caches stopped the arc_adapt, but it's strange that it wasn't able to clear up whatever memory it was trying to clean up in over 17 hours with nothing else running.

@dweeezil
Copy link
Contributor

dweeezil commented Apr 5, 2015

@angstymeat Could you please grab some stacks from the arc adapt thread with cat /proc/<pid>/stack while it's spinning. As of the 3225 patch, the adapt thread runs under PF_FSTRANS so I'm wondering if we need to add some spin protection in arc_do_user_evicts(). It would be nice to see a stack from all tasks on your system when this is happening (SysRq-t).

@angstymeat
Copy link
Author

So far, these have been easily repeatable on my system. If that holds up it should do it again when it runs Sunday morning.

Is there another way to get the stack trace from all processes? This machine is remote-located, and while I have remote access to it I don't think the iDRAC software will let me send a sysrq key combo to it. Would that just be dumping /proc/*/stack?

@dweeezil
Copy link
Contributor

dweeezil commented Apr 5, 2015

You could just do echo t > /proc/sysrq-trigger but in order to capture the output, it'll have to be successfully written to a log file by your syslogging daemon. If you're using iDRAC, I recommend you configure the system to use a serial console (console=ttySX kernel parameter) and also run a getty on it. Then, after ssh-ing into the iDRAC and using the console command to connect to the tty port (forgot what "X" is for Dell iDRAC), you should be able to use ~B to send a BREAK to the console and then type the appropriate SysRq key. Since this is the equivalent of using a VGA console, you need to enable the SysRq functions. I recommend adding the sysrq_always_enabled kernel parameter to enable all SysRq functions.

You can also dump each of /proc/*/stack but a bunch of other useful information is also dumped when using the SysRq-t.

@angstymeat
Copy link
Author

And it did not repeat this morning. I'll see how it does over the next couple of nights.

@angstymeat
Copy link
Author

Ok, I got it act up again. It looks like I can do this consistently when running a scrub while my rsyncs are running.

I've uploaded two dumps:

It is currently still running in this state with arc_adapt spinning at 3.3% CPU.

@theonewolf
Copy link

I think I am hitting this exact same issue.

I am getting this kernel stack trace repeatadly while doing a large copy into ZFS with 8 concurrent writers (I have 16 GiB of memory, 200 - 300 GiB referenced in the ZFS data set, and ~50 GiB deduplicated):

[61440.448052] INFO: task txg_quiesce:21477 blocked for more than 120 seconds.
[61440.448884]       Tainted: P           OX 3.13.0-48-generic #80-Ubuntu
[61440.449617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[61440.450562] txg_quiesce     D ffff88041fdd34c0     0 21477      2 0x00000000
[61440.450565]  ffff880184845d68 0000000000000046 ffff880138bbb000 ffff880184845fd8
[61440.450567]  00000000000134c0 00000000000134c0 ffff880138bbb000 ffffc9002f9b26d8
[61440.450569]  ffffc9002f9b26a8 ffffc9002f9b26e0 ffffc9002f9b26d0 0000000000000000
[61440.450571] Call Trace:
[61440.450577]  [<ffffffff81725919>] schedule+0x29/0x70
[61440.450586]  [<ffffffffa00bc7b5>] cv_wait_common+0x125/0x1c0 [spl]
[61440.450589]  [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[61440.450593]  [<ffffffffa00bc865>] __cv_wait+0x15/0x20 [spl]
[61440.450616]  [<ffffffffa026eb13>] txg_quiesce_thread+0x263/0x3d0 [zfs]
[61440.450632]  [<ffffffffa026e8b0>] ? txg_sync_thread+0x600/0x600 [zfs]
[61440.450636]  [<ffffffffa00b4aba>] thread_generic_wrapper+0x7a/0x90 [spl]
[61440.450639]  [<ffffffffa00b4a40>] ? __thread_exit+0xa0/0xa0 [spl]
[61440.450642]  [<ffffffff8108b5b2>] kthread+0xd2/0xf0
[61440.450643]  [<ffffffff8108b4e0>] ? kthread_create_on_node+0x1c0/0x1c0
[61440.450645]  [<ffffffff81731f0c>] ret_from_fork+0x7c/0xb0
[61440.450647]  [<ffffffff8108b4e0>] ? kthread_create_on_node+0x1c0/0x1c0
[61440.450654] INFO: task python:15455 blocked for more than 120 seconds.
[61440.451392]       Tainted: P           OX 3.13.0-48-generic #80-Ubuntu
[61440.452154] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[61440.453092] python          D ffff88041fc134c0     0 15455  19265 0x00000000
[61440.453094]  ffff8801ef643b28 0000000000000082 ffff88011c206000 ffff8801ef643fd8
[61440.453096]  00000000000134c0 00000000000134c0 ffff88011c206000 ffff88037c5120a0
[61440.453098]  ffff88037c511f38 ffff88037c5120a8 ffff88037c511f60 0000000000000000
[61440.453099] Call Trace:
[61440.453102]  [<ffffffff81725919>] schedule+0x29/0x70
[61440.453106]  [<ffffffffa00bc7b5>] cv_wait_common+0x125/0x1c0 [spl]
[61440.453108]  [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[61440.453112]  [<ffffffffa00bc865>] __cv_wait+0x15/0x20 [spl]
[61440.453125]  [<ffffffffa022fd9b>] dmu_tx_wait+0x19b/0x2f0 [zfs]
[61440.453136]  [<ffffffffa022ff81>] dmu_tx_assign+0x91/0x4b0 [zfs]
[61440.453154]  [<ffffffffa02a3da1>] zfs_setattr+0x601/0x19e0 [zfs]
[61440.453157]  [<ffffffff811a2c59>] ? __kmalloc+0x1e9/0x230
[61440.453161]  [<ffffffffa00b32d6>] ? kmem_alloc_debug+0x96/0x3c0 [spl]
[61440.453164]  [<ffffffffa00b32d6>] ? kmem_alloc_debug+0x96/0x3c0 [spl]
[61440.453166]  [<ffffffff811cde4f>] ? getname_flags+0x4f/0x190
[61440.453184]  [<ffffffffa02bc32b>] zpl_setattr+0xdb/0x120 [zfs]
[61440.453186]  [<ffffffff811da761>] notify_change+0x231/0x390
[61440.453188]  [<ffffffff811bb95c>] chmod_common+0x10c/0x130
[61440.453190]  [<ffffffff811c7bfe>] ? path_put+0x1e/0x30
[61440.453191]  [<ffffffff811bcba5>] SyS_chmod+0x45/0xa0
[61440.453205]  [<ffffffff81731fbd>] system_call_fastpath+0x1a/0x1f
[61440.453207] INFO: task python:15458 blocked for more than 120 seconds.
[61440.453926]       Tainted: P           OX 3.13.0-48-generic #80-Ubuntu
[61440.454642] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[61440.455567] python          D ffff88041fd134c0     0 15458   3147 0x00000000
[61440.455569]  ffff880111c2baa0 0000000000000086 ffff88036ce1c800 ffff880111c2bfd8
[61440.455571]  00000000000134c0 00000000000134c0 ffff88036ce1c800 ffff88037c5120a0
[61440.455572]  ffff88037c511f38 ffff88037c5120a8 ffff88037c511f60 0000000000000000
[61440.455574] Call Trace:
[61440.455576]  [<ffffffff81725919>] schedule+0x29/0x70
[61440.455580]  [<ffffffffa00bc7b5>] cv_wait_common+0x125/0x1c0 [spl]
[61440.455582]  [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[61440.455586]  [<ffffffffa00bc865>] __cv_wait+0x15/0x20 [spl]
[61440.455597]  [<ffffffffa022fd9b>] dmu_tx_wait+0x19b/0x2f0 [zfs]
[61440.455614]  [<ffffffffa02a1f23>] zfs_create+0x2a3/0x700 [zfs]
[61440.455631]  [<ffffffffa02bccfd>] zpl_create+0x9d/0x190 [zfs]
[61440.455633]  [<ffffffff811c9eed>] vfs_create+0xcd/0x130
[61440.455634]  [<ffffffff811cd56e>] do_last+0x103e/0x1230
[61440.455636]  [<ffffffff811cad16>] ? link_path_walk+0x256/0x870
[61440.455638]  [<ffffffff813152bb>] ? apparmor_file_alloc_security+0x5b/0x180
[61440.455641]  [<ffffffff812d7906>] ? security_file_alloc+0x16/0x20
[61440.455642]  [<ffffffff811cd81b>] path_openat+0xbb/0x650
[61440.455644]  [<ffffffff811cec1a>] do_filp_open+0x3a/0x90
[61440.455646]  [<ffffffff811dbab7>] ? __alloc_fd+0xa7/0x130
[61440.455648]  [<ffffffff811bd299>] do_sys_open+0x129/0x280
[61440.455649]  [<ffffffff811bd40e>] SyS_open+0x1e/0x20
[61440.455651]  [<ffffffff81731fbd>] system_call_fastpath+0x1a/0x1f
[61440.455652] INFO: task python:15459 blocked for more than 120 seconds.
[61440.456393]       Tainted: P           OX 3.13.0-48-generic #80-Ubuntu
[61440.457111] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[61440.458045] python          D ffff88041fd534c0     0 15459  12157 0x00000000
[61440.458047]  ffff8801121b5aa0 0000000000000086 ffff88003906c800 ffff8801121b5fd8
[61440.458048]  00000000000134c0 00000000000134c0 ffff88003906c800 ffff88037c5120a0
[61440.458050]  ffff88037c511f38 ffff88037c5120a8 ffff88037c511f60 0000000000000000
[61440.458051] Call Trace:
[61440.458054]  [<ffffffff81725919>] schedule+0x29/0x70
[61440.458058]  [<ffffffffa00bc7b5>] cv_wait_common+0x125/0x1c0 [spl]
[61440.458060]  [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[61440.458063]  [<ffffffffa00bc865>] __cv_wait+0x15/0x20 [spl]
[61440.458074]  [<ffffffffa022fd9b>] dmu_tx_wait+0x19b/0x2f0 [zfs]
[61440.458091]  [<ffffffffa02a1f23>] zfs_create+0x2a3/0x700 [zfs]
[61440.458107]  [<ffffffffa02bccfd>] zpl_create+0x9d/0x190 [zfs]
[61440.458109]  [<ffffffff811c9eed>] vfs_create+0xcd/0x130
[61440.458111]  [<ffffffff811cd56e>] do_last+0x103e/0x1230
[61440.458113]  [<ffffffff811cad16>] ? link_path_walk+0x256/0x870
[61440.458114]  [<ffffffff813152bb>] ? apparmor_file_alloc_security+0x5b/0x180
[61440.458116]  [<ffffffff812d7906>] ? security_file_alloc+0x16/0x20
[61440.458118]  [<ffffffff811cd81b>] path_openat+0xbb/0x650
[61440.458120]  [<ffffffff811cec1a>] do_filp_open+0x3a/0x90
[61440.458122]  [<ffffffff811dbab7>] ? __alloc_fd+0xa7/0x130
[61440.458123]  [<ffffffff811bd299>] do_sys_open+0x129/0x280
[61440.458125]  [<ffffffff811bd40e>] SyS_open+0x1e/0x20
[61440.458126]  [<ffffffff81731fbd>] system_call_fastpath+0x1a/0x1f
[61440.458127] INFO: task python:15460 blocked for more than 120 seconds.
[61440.458849]       Tainted: P           OX 3.13.0-48-generic #80-Ubuntu
[61440.459564] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[61440.460507] python          D ffff88041fc134c0     0 15460  24504 0x00000000
[61440.460509]  ffff8803ad065aa0 0000000000000082 ffff880115084800 ffff8803ad065fd8
[61440.460511]  00000000000134c0 00000000000134c0 ffff880115084800 ffff88037c5120a0
[61440.460519]  ffff88037c511f38 ffff88037c5120a8 ffff88037c511f60 0000000000000000
[61440.460521] Call Trace:
[61440.460523]  [<ffffffff81725919>] schedule+0x29/0x70
[61440.460528]  [<ffffffffa00bc7b5>] cv_wait_common+0x125/0x1c0 [spl]
[61440.460530]  [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[61440.460533]  [<ffffffffa00bc865>] __cv_wait+0x15/0x20 [spl]
[61440.460544]  [<ffffffffa022fd9b>] dmu_tx_wait+0x19b/0x2f0 [zfs]
[61440.460560]  [<ffffffffa02a1f23>] zfs_create+0x2a3/0x700 [zfs]
[61440.460577]  [<ffffffffa02bccfd>] zpl_create+0x9d/0x190 [zfs]
[61440.460578]  [<ffffffff811c9eed>] vfs_create+0xcd/0x130
[61440.460580]  [<ffffffff811cd56e>] do_last+0x103e/0x1230
[61440.460582]  [<ffffffff811cad16>] ? link_path_walk+0x256/0x870
[61440.460583]  [<ffffffff813152bb>] ? apparmor_file_alloc_security+0x5b/0x180
[61440.460585]  [<ffffffff812d7906>] ? security_file_alloc+0x16/0x20
[61440.460587]  [<ffffffff811cd81b>] path_openat+0xbb/0x650
[61440.460588]  [<ffffffff811cec1a>] do_filp_open+0x3a/0x90
[61440.460590]  [<ffffffff811dbab7>] ? __alloc_fd+0xa7/0x130
[61440.460592]  [<ffffffff811bd299>] do_sys_open+0x129/0x280
[61440.460593]  [<ffffffff811bd40e>] SyS_open+0x1e/0x20
[61440.460595]  [<ffffffff81731fbd>] system_call_fastpath+0x1a/0x1f
[61440.460596] INFO: task python:15461 blocked for more than 120 seconds.
[61440.461316]       Tainted: P           OX 3.13.0-48-generic #80-Ubuntu
[61440.462032] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[61440.462955] python          D ffff88041fc534c0     0 15461  17592 0x00000000
[61440.462957]  ffff8802ff2c5aa0 0000000000000082 ffff880112d79800 ffff8802ff2c5fd8
[61440.462959]  00000000000134c0 00000000000134c0 ffff880112d79800 ffff88037c5120a0
[61440.462960]  ffff88037c511f38 ffff88037c5120a8 ffff88037c511f60 0000000000000000
[61440.462962] Call Trace:
[61440.462964]  [<ffffffff81725919>] schedule+0x29/0x70
[61440.462968]  [<ffffffffa00bc7b5>] cv_wait_common+0x125/0x1c0 [spl]
[61440.462970]  [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[61440.462973]  [<ffffffffa00bc865>] __cv_wait+0x15/0x20 [spl]
[61440.462984]  [<ffffffffa022fd9b>] dmu_tx_wait+0x19b/0x2f0 [zfs]
[61440.463000]  [<ffffffffa02a1f23>] zfs_create+0x2a3/0x700 [zfs]
[61440.463016]  [<ffffffffa02bccfd>] zpl_create+0x9d/0x190 [zfs]
[61440.463018]  [<ffffffff811c9eed>] vfs_create+0xcd/0x130
[61440.463020]  [<ffffffff811cd56e>] do_last+0x103e/0x1230
[61440.463021]  [<ffffffff811cad16>] ? link_path_walk+0x256/0x870
[61440.463023]  [<ffffffff813152bb>] ? apparmor_file_alloc_security+0x5b/0x180
[61440.463024]  [<ffffffff812d7906>] ? security_file_alloc+0x16/0x20
[61440.463026]  [<ffffffff811cd81b>] path_openat+0xbb/0x650
[61440.463028]  [<ffffffff811cec1a>] do_filp_open+0x3a/0x90
[61440.463030]  [<ffffffff811dbab7>] ? __alloc_fd+0xa7/0x130
[61440.463031]  [<ffffffff811bd299>] do_sys_open+0x129/0x280
[61440.463033]  [<ffffffff811bd40e>] SyS_open+0x1e/0x20
[61440.463034]  [<ffffffff81731fbd>] system_call_fastpath+0x1a/0x1f
[61440.463035] INFO: task python:15464 blocked for more than 120 seconds.
[61440.463752]       Tainted: P           OX 3.13.0-48-generic #80-Ubuntu
[61440.464481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[61440.465409] python          D ffff88041fc134c0     0 15464  22571 0x00000000
[61440.465411]  ffff8801f4029aa0 0000000000000082 ffff880191e80000 ffff8801f4029fd8
[61440.465413]  00000000000134c0 00000000000134c0 ffff880191e80000 ffff88037c5120a0
[61440.465414]  ffff88037c511f38 ffff88037c5120a8 ffff88037c511f60 0000000000000000
[61440.465416] Call Trace:
[61440.465418]  [<ffffffff81725919>] schedule+0x29/0x70
[61440.465422]  [<ffffffffa00bc7b5>] cv_wait_common+0x125/0x1c0 [spl]
[61440.465424]  [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[61440.465427]  [<ffffffffa00bc865>] __cv_wait+0x15/0x20 [spl]
[61440.465437]  [<ffffffffa022fd9b>] dmu_tx_wait+0x19b/0x2f0 [zfs]
[61440.465454]  [<ffffffffa02a1f23>] zfs_create+0x2a3/0x700 [zfs]
[61440.465477]  [<ffffffffa02bccfd>] zpl_create+0x9d/0x190 [zfs]
[61440.465480]  [<ffffffff811c9eed>] vfs_create+0xcd/0x130
[61440.465481]  [<ffffffff811cd56e>] do_last+0x103e/0x1230
[61440.465483]  [<ffffffff811cad16>] ? link_path_walk+0x256/0x870
[61440.465484]  [<ffffffff813152bb>] ? apparmor_file_alloc_security+0x5b/0x180
[61440.465486]  [<ffffffff812d7906>] ? security_file_alloc+0x16/0x20
[61440.465488]  [<ffffffff811cd81b>] path_openat+0xbb/0x650
[61440.465490]  [<ffffffff811cec1a>] do_filp_open+0x3a/0x90
[61440.465492]  [<ffffffff811dbab7>] ? __alloc_fd+0xa7/0x130
[61440.465493]  [<ffffffff811bd299>] do_sys_open+0x129/0x280
[61440.465495]  [<ffffffff811bd40e>] SyS_open+0x1e/0x20
[61440.465496]  [<ffffffff81731fbd>] system_call_fastpath+0x1a/0x1f
[61440.465498] INFO: task python:15465 blocked for more than 120 seconds.
[61440.466213]       Tainted: P           OX 3.13.0-48-generic #80-Ubuntu
[61440.466925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[61440.467850] python          D ffff88041fd934c0     0 15465  18424 0x00000000
[61440.467851]  ffff8801132f3aa0 0000000000000086 ffff8801684d4800 ffff8801132f3fd8
[61440.467853]  00000000000134c0 00000000000134c0 ffff8801684d4800 ffff88037c5120a0
[61440.467854]  ffff88037c511f38 ffff88037c5120a8 ffff88037c511f60 0000000000000000
[61440.467863] Call Trace:
[61440.467866]  [<ffffffff81725919>] schedule+0x29/0x70
[61440.467870]  [<ffffffffa00bc7b5>] cv_wait_common+0x125/0x1c0 [spl]
[61440.467872]  [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[61440.467875]  [<ffffffffa00bc865>] __cv_wait+0x15/0x20 [spl]
[61440.467886]  [<ffffffffa022fd9b>] dmu_tx_wait+0x19b/0x2f0 [zfs]
[61440.467902]  [<ffffffffa02a1f23>] zfs_create+0x2a3/0x700 [zfs]
[61440.467918]  [<ffffffffa02bccfd>] zpl_create+0x9d/0x190 [zfs]
[61440.467920]  [<ffffffff811c9eed>] vfs_create+0xcd/0x130
[61440.467922]  [<ffffffff811cd56e>] do_last+0x103e/0x1230
[61440.467923]  [<ffffffff811cad16>] ? link_path_walk+0x256/0x870
[61440.467925]  [<ffffffff813152bb>] ? apparmor_file_alloc_security+0x5b/0x180
[61440.467926]  [<ffffffff812d7906>] ? security_file_alloc+0x16/0x20
[61440.467928]  [<ffffffff811cd81b>] path_openat+0xbb/0x650
[61440.467930]  [<ffffffff811cec1a>] do_filp_open+0x3a/0x90
[61440.467932]  [<ffffffff811dbab7>] ? __alloc_fd+0xa7/0x130
[61440.467933]  [<ffffffff811bd299>] do_sys_open+0x129/0x280
[61440.467935]  [<ffffffff811bd40e>] SyS_open+0x1e/0x20
[61440.467936]  [<ffffffff81731fbd>] system_call_fastpath+0x1a/0x1f
[61560.464039] INFO: task txg_quiesce:21477 blocked for more than 120 seconds.
[61560.464904]       Tainted: P           OX 3.13.0-48-generic #80-Ubuntu
[61560.465652] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[61560.466623] txg_quiesce     D ffff88041fdd34c0     0 21477      2 0x00000000
[61560.466625]  ffff880184845d68 0000000000000046 ffff880138bbb000 ffff880184845fd8
[61560.466628]  00000000000134c0 00000000000134c0 ffff880138bbb000 ffffc9002f9b26d8
[61560.466630]  ffffc9002f9b26a8 ffffc9002f9b26e0 ffffc9002f9b26d0 0000000000000000
[61560.466632] Call Trace:
[61560.466638]  [<ffffffff81725919>] schedule+0x29/0x70
[61560.466647]  [<ffffffffa00bc7b5>] cv_wait_common+0x125/0x1c0 [spl]
[61560.466650]  [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[61560.466654]  [<ffffffffa00bc865>] __cv_wait+0x15/0x20 [spl]
[61560.466676]  [<ffffffffa026eb13>] txg_quiesce_thread+0x263/0x3d0 [zfs]
[61560.466692]  [<ffffffffa026e8b0>] ? txg_sync_thread+0x600/0x600 [zfs]
[61560.466696]  [<ffffffffa00b4aba>] thread_generic_wrapper+0x7a/0x90 [spl]
[61560.466699]  [<ffffffffa00b4a40>] ? __thread_exit+0xa0/0xa0 [spl]
[61560.466701]  [<ffffffff8108b5b2>] kthread+0xd2/0xf0
[61560.466703]  [<ffffffff8108b4e0>] ? kthread_create_on_node+0x1c0/0x1c0
[61560.466705]  [<ffffffff81731f0c>] ret_from_fork+0x7c/0xb0
[61560.466707]  [<ffffffff8108b4e0>] ? kthread_create_on_node+0x1c0/0x1c0
[61560.466713] INFO: task python:15455 blocked for more than 120 seconds.
[61560.467479]       Tainted: P           OX 3.13.0-48-generic #80-Ubuntu
[61560.468268] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[61560.469246] python          D ffff88041fc134c0     0 15455  19265 0x00000000
[61560.469248]  ffff8801ef643b28 0000000000000082 ffff88011c206000 ffff8801ef643fd8
[61560.469250]  00000000000134c0 00000000000134c0 ffff88011c206000 ffff88037c5120a0
[61560.469251]  ffff88037c511f38 ffff88037c5120a8 ffff88037c511f60 0000000000000000
[61560.469253] Call Trace:
[61560.469256]  [<ffffffff81725919>] schedule+0x29/0x70
[61560.469261]  [<ffffffffa00bc7b5>] cv_wait_common+0x125/0x1c0 [spl]
[61560.469264]  [<ffffffff810ab120>] ? prepare_to_wait_event+0x100/0x100
[61560.469268]  [<ffffffffa00bc865>] __cv_wait+0x15/0x20 [spl]
[61560.469283]  [<ffffffffa022fd9b>] dmu_tx_wait+0x19b/0x2f0 [zfs]
[61560.469294]  [<ffffffffa022ff81>] dmu_tx_assign+0x91/0x4b0 [zfs]
[61560.469313]  [<ffffffffa02a3da1>] zfs_setattr+0x601/0x19e0 [zfs]
[61560.469316]  [<ffffffff811a2c59>] ? __kmalloc+0x1e9/0x230
[61560.469319]  [<ffffffffa00b32d6>] ? kmem_alloc_debug+0x96/0x3c0 [spl]
[61560.469323]  [<ffffffffa00b32d6>] ? kmem_alloc_debug+0x96/0x3c0 [spl]
[61560.469325]  [<ffffffff811cde4f>] ? getname_flags+0x4f/0x190
[61560.469343]  [<ffffffffa02bc32b>] zpl_setattr+0xdb/0x120 [zfs]
[61560.469346]  [<ffffffff811da761>] notify_change+0x231/0x390
[61560.469347]  [<ffffffff811bb95c>] chmod_common+0x10c/0x130
[61560.469349]  [<ffffffff811c7bfe>] ? path_put+0x1e/0x30
[61560.469351]  [<ffffffff811bcba5>] SyS_chmod+0x45/0xa0
[61560.469353]  [<ffffffff81731fbd>] system_call_fastpath+0x1a/0x1f

@theonewolf
Copy link

The command perf top is showing this on my stuck system live:

 23.37%  [kernel]            [k] _raw_spin_unlock_irqrestore
 19.51%  [kernel]            [k] _raw_spin_lock
 12.03%  [kernel]            [k] __spl_kmem_cache_generic_shrinker.isra.11
  6.54%  [kernel]            [k] _raw_spin_unlock
  5.85%  [kernel]            [k] shrink_slab
  5.79%  [kernel]            [k] down_read_trylock
  3.24%  [kernel]            [k] finish_task_switch
  2.18%  [kernel]            [k] up_read
  1.52%  [kernel]            [k] super_cache_count
  1.47%  [kernel]            [k] grab_super_passive
  1.22%  [kernel]            [k] find_next_bit
  1.17%  [kernel]            [k] get_page_from_freelist
  0.84%  [kernel]            [k] drain_all_pages
  0.82%  [kernel]            [k] list_lru_count_node
  0.81%  [kernel]            [k] fill_contig_page_info
  0.78%  [kernel]            [k] __zone_watermark_ok
  0.70%  [kernel]            [k] balance_pgdat
  0.66%  [kernel]            [k] tick_nohz_idle_exit

@theonewolf
Copy link

@kpande I think that dedup in my case is causing ARC memory pressure faster, but I'm not sure.

I had no idea dedup was so unreliable that it basically shouldn't be used. 10-15 GiB of a working set should be nothing...I am surprised that you experience unresponsive lockup (for how long?).

My lockups tend to go for ~20-30 minutes, not 10's of hours.

@theonewolf
Copy link

@kpande did you try using the drop_caches hack that @dweeezil mentions?

I actually don't need this dedup for production, which is good. I was just going to use ZFS for some experiments on a dataset (file-level vs block-level dedup).

Trying this to see if it alleviates the problem:

while true
> do
> sleep 30m
> echo 3 > /proc/sys/vm/drop_caches
> done

@theonewolf
Copy link

Ah, reboot :-)

I have contemplated that, but I can't easily quit my current "workload" and resume it unfortunately.

As I said, I have 8 writers that are a little over 50% done writing things into ZFS and a reboot means restarting them too 😢.

@theonewolf
Copy link

For me, drop_caches every 30 minutes solved my issue entirely.

@behlendorf
Copy link
Contributor

@kpande please update to zfs-0.6.3-262. I believe bc88866 will solve your issue.

@behlendorf
Copy link
Contributor

@kpande if you can put together a test case I'm happy to add it. For the moment you could add it to the existing zconfig.sh which is an existing mishmash of tests.

@angstymeat
Copy link
Author

I'm running 0.6.4-1 and I'm back to txg_sync being stuck along with z_null_iss & txg_quiesce. There was no scrub running during this, just straight rsyncs.

@behlendorf behlendorf added this to the 0.6.5 milestone Apr 10, 2015
@behlendorf
Copy link
Contributor

@angstymeat can you post backtraces for the stuck txg_sync, z_null_iss, and txg_quiesce threads. I took a look at the previous backtraces you added and it looked like txg_sync was waiting on scrub IOs. By chance have to changed either the zfs_top_maxinflight, zfs_no_scrub_io, or zfs_scan_idle module options?

@angstymeat
Copy link
Author

It looks like it unstuck itself after 10 hours, early this morning. I'm running another backup to see if I can get it to stick again.

The only variable that I'm modifying is zfs_arc_meta_prune which is currently set at 100000. I must have forgotten to comment it out when I was done testing out some of the patches.

@angstymeat
Copy link
Author

Well, I didn't get txg_sync stuck, but I have arc_adapt spinning again.

Here's arcstats:

5 1 0x01 86 4128 1402982628 185180206737167
name                            type data
hits                            4    229245672
misses                          4    25255138
demand_data_hits                4    8778124
demand_data_misses              4    65236
demand_metadata_hits            4    139736057
demand_metadata_misses          4    17486628
prefetch_data_hits              4    1566917
prefetch_data_misses            4    4470829
prefetch_metadata_hits          4    79164574
prefetch_metadata_misses        4    3232445
mru_hits                        4    45473334
mru_ghost_hits                  4    708764
mfu_hits                        4    103040847
mfu_ghost_hits                  4    601161
deleted                         4    22772013
recycle_miss                    4    15129839
mutex_miss                      4    57844
evict_skip                      4    27972457569
evict_l2_cached                 4    0
evict_l2_eligible               4    862824977920
evict_l2_ineligible             4    30634188800
hash_elements                   4    105383
hash_elements_max               4    659036
hash_collisions                 4    1975064
hash_chains                     4    2630
hash_chain_max                  4    6
p                               4    6569592320
c                               4    6569592320
c_min                           4    4194304
c_max                           4    8338591744
size                            4    6378029368
hdr_size                        4    43041624
data_size                       4    0
meta_size                       4    1740378112
other_size                      4    4594609632
anon_size                       4    14303232
anon_evict_data                 4    0
anon_evict_metadata             4    0
mru_size                        4    1374103552
mru_evict_data                  4    0
mru_evict_metadata              4    0
mru_ghost_size                  4    0
mru_ghost_evict_data            4    0
mru_ghost_evict_metadata        4    0
mfu_size                        4    351971328
mfu_evict_data                  4    0
mfu_evict_metadata              4    0
mfu_ghost_size                  4    0
mfu_ghost_evict_data            4    0
mfu_ghost_evict_metadata        4    0
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_cdata_free_on_write          4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    72599
memory_indirect_count           4    12631
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    231927281
arc_meta_used                   4    6378029368
arc_meta_limit                  4    6253943808
arc_meta_max                    4    6387983976

slabtop:

 Active / Total Objects (% used)    : 14719668 / 15637599 (94.1%)
 Active / Total Slabs (% used)      : 458952 / 458952 (100.0%)
 Active / Total Caches (% used)     : 165 / 230 (71.7%)
 Active / Total Size (% used)       : 8328875.66K / 8450733.91K (98.6%)
 Minimum / Average / Maximum Object : 0.01K / 0.54K / 16.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
3139667 3139395  99%    0.30K  59239       53    947824K dmu_buf_impl_t
3034851 3034851 100%    0.86K  82023       37   2624736K dnode_t
3033888 3033868  99%    0.50K  94809       32   1516944K zio_buf_512
2619708 2060504  78%    0.10K  67172       39    268688K sa_cache
2044014 2001277  97%    0.19K  48667       42    389336K dentry
528896 365833  69%    0.06K   8264       64     33056K kmalloc-64
214708 106826  49%    0.30K   4129       52     66064K arc_buf_hdr_t
152704 150762  98%    0.06K   2386       64      9544K range_seg_cache
110838  93394  84%    0.09K   2639       42     10556K kmalloc-96
110586 106297  96%    0.09K   2633       42     10532K arc_buf_t
105426 105371  99%   16.00K  52713        2   1686816K zio_buf_16384
103152 103131  99%    8.00K  25788        4    825216K kmalloc-8192
 92416  92416 100%    0.03K    722      128      2888K kmalloc-32
 61503  61335  99%    0.10K   1577       39      6308K buffer_head
 54784  52680  96%    0.01K    107      512       428K kmalloc-8
 32300  32300 100%    0.12K    950       34      3800K kernfs_node_cache
 14178  13590  95%    0.08K    278       51      1112K selinux_inode_security
 11032  10270  93%    0.57K    197       56      6304K inode_cache
 10962   6013  54%    0.19K    261       42      2088K kmalloc-192
  9984   9984 100%    0.06K    156       64       624K anon_vma_chain
  8848   5245  59%    0.57K    158       56      5056K radix_tree_node
  7872   5853  74%    0.12K    246       32       984K kmalloc-128
  7548   7548 100%    0.04K     74      102       296K ext4_extent_status
  7168   7151  99%    0.02K     28      256       112K kmalloc-16
  6952   6083  87%    0.18K    158       44      1264K vm_area_struct
  6350   6101  96%    0.63K    127       50      4064K proc_inode_cache
  6048   6048 100%    0.07K    108       56       432K anon_vma
  6035   6035 100%    0.05K     71       85       284K zio_link_cache
  5936   5789  97%    2.00K    371       16     11872K kmalloc-2048
  4832   3934  81%    0.50K    151       32      2416K kmalloc-512
  4830   4788  99%    0.19K    115       42       920K cred_jar
  3968   3832  96%    1.00K    124       32      3968K kmalloc-1024
  3485   3485 100%    0.05K     41       85       164K ftrace_event_field
  2720   2217  81%    0.12K     85       32       340K eventpoll_epi
  2592   2520  97%    0.11K     72       36       288K jbd2_journal_head
  2560   2560 100%    0.98K     80       32      2560K ext4_inode_cache
  2420   1960  80%    0.36K     55       44       880K blkdev_requests
  2368   2005  84%    0.25K     74       32       592K filp
  2176   2056  94%    0.25K     68       32       544K skbuff_head_cache
  2176   1954  89%    0.25K     68       32       544K bio-0
  2144   2144 100%    0.25K     67       32       536K kmalloc-256
  1920   1760  91%    0.12K     60       32       240K scsi_sense_cache
  1776   1776 100%    0.66K     37       48      1184K shmem_inode_cache
  1736   1736 100%    0.14K     31       56       248K ext4_groupinfo_4k

and meminfo:

MemTotal:       16286312 kB
MemFree:         4101140 kB
MemAvailable:    4512324 kB
Buffers:          211224 kB
Cached:           122868 kB
SwapCached:           64 kB
Active:           303824 kB
Inactive:          73460 kB
Active(anon):       9320 kB
Inactive(anon):    34352 kB
Active(file):     294504 kB
Inactive(file):    39108 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8216572 kB
SwapFree:        8215788 kB
Dirty:                 4 kB
Writeback:             0 kB
AnonPages:         43160 kB
Mapped:            67360 kB
Shmem:               456 kB
Slab:            8494368 kB
SReclaimable:     415444 kB
SUnreclaim:      8078924 kB
KernelStack:        8688 kB
PageTables:         5608 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    16359728 kB
Committed_AS:     216556 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      646256 kB
VmallocChunk:   34345126052 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       78436 kB
DirectMap2M:     3057664 kB
DirectMap1G:    13631488 kB

I've done another process dump which is available at https://cloud.passcal.nmt.edu/index.php/s/z7ayrqMeJLblvYS.

@behlendorf
Copy link
Contributor

@angstymeat it appears you're now seeing the ARC spinning because it's completely consumed by dnodes which can't be dropped for some reason. This is different from the original issue, but clearly still a problem. Two questions, 1) is this an rsync workload for small files? 2) what kernel version are you using?

@angstymeat
Copy link
Author

Yes, I'm thinking this is two separate issues. I had been having problems with arc_adapt which went away when I think #3225 was committed, but it looks like it's back.

Yes, this is an rsync workload with small files, syncing several email servers.

I'm running Fedora 20 with the 3.17.8-200.fc20.x86_64. I'm having to keep it and some other systems on 3.17.8 because of the .zfs/snapshot automount issue #3030.

@angstymeat
Copy link
Author

I tried again today just after the patch in #3283 was committed, and I was left with arc_adapt spinning at 3.3% cpu again for a few hours after the backup was finished I before I stopped it.

Should I break this out into a separate issue?

@behlendorf
Copy link
Contributor

I think that's a good idea. Can you open a new issue clearly describing the remaining issue and posting the relevant stats.

@dweeezil
Copy link
Contributor

@angstymeat I didn't expect the patch for #3283 was going to be sufficient for all of these types of workloads. This issue seems like it may have suffered from a bit of creep so I think @behlendorf's suggestion of a new issue report is a good one.

It would be nice to see the following when the various processes are spinning on your system:

  • /proc/slabinfo
  • /proc/spl/kmem/slab
  • /proc/spl/kstat/zfs/arcstats
  • /proc/meminfo
  • SysRq-t (stacks of all tasks)
  • SysRq-w (stacks of blocked tasks)
  • Output of perf report --stdio after running perf record -ag for about 15 seconds while spinning (or maybe access to the perf.data file)
  • A description of the workload on the system at the time

It sounds like there may still be lingering issues with the typical "rsync server" application. To that end, I will set up a test rig with current master code and see what happens when I feed it with a bunch of rsyncs to stress its memory (which I'll artificially constrain).

@angstymeat
Copy link
Author

I'll start a new one and post this information when it happens again.

@angstymeat
Copy link
Author

Ok, I've moved the arc_adapt issue to #3303.

@ghost
Copy link

ghost commented May 13, 2015

I have this issue now as well. txg_sync, z_null_iss/0 and txg_quiesce are stuck and consuming CPU.

slabinfo http://pastebin.com/JLyqJKJA
kmem/slab http://pastebin.com/sRY3BWQn
arcstats http://pastebin.com/T6tm8702
meminfo http://pastebin.com/qDrDipkV
SysRq-t http://paste.ubuntu.com/11109410/
SysRq-w http://pastebin.com/qn8qqKiB

Worth mentioning is also that I do use dmcrypt on the block devices, in case it makes any difference.
The freeze occured shortly after firing up rtorrent to start seeding a torrent.

Up until then it has worked good, I have no problems with "normal" IO (moving and opening files).
Any ideas?

@kernelOfTruth
Copy link
Contributor

@hallabro opening up a new issue might be better, this one's closed

@ghost
Copy link

ghost commented May 13, 2015

@kernelOfTruth it sounds very much like the bug the original post is describing but I will open a new issue instead.

@eiselekd

This comment has been minimized.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants
@behlendorf @theonewolf @snajpa @eiselekd @dweeezil @kernelOfTruth @angstymeat and others