-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck txg_sync process, and IO speed issues when ARC is full #3235
Comments
Can you please post /proc/spl/kstat/zfs/arcstats, slabtop -o, /proc/meminfo next time it happens? |
Will do. I'm running it on a loop right now to see if I can get it to fail sooner. I'm graphing system memory and the ARC memory usage. I see arc_meta_used max out at arc_meta_limit, and I haven't seen it exceed that since the recent commits. The system doesn't appear to be out of memory either. I'll post more when it happens again. It seems to be every 2 or 3 days if I just let it run on its own. |
CC: @kpande (#3160 (comment)) |
https://icesquare.com/wordpress/how-to-improve-zfs-performance/
yeah well - it doesn't seem to help, it's per default at 5 anything unusual besides that in top, swapping or arcstats ? - basically what @snajpa asked what hardware is that (harddrives, processor) ? any encryption used ? more things to consider: 3.17.8 doesn't seem to have
but that's probably more related to lockups than sync/writeback stalling you could give that a try, if you're not bound to a stock kernel or software otherwise - like I wrote in #2272
|
@angstymeat If any of the blocked processes are stuck in I'm trying to nail down one more rather hard to reproduce deadlock and also laying some groundwork for better integration with the kernel's lockdep facility. This patch is currently very fluid and is likely to be re-pushed at least one more time today. |
@snajpa Here's arcstats, slabtop, and meminfo...
@kernelOfTruth, I haven't done anything with vfs.zfs.txg.timeout, although I tried /sys/module/zfs/parameters/zfs_arc_meta_prune without it seeming to have an effect. My kernel and everything is straight-up stock, except I did compile my own version of rsync from source since version 3.1.1 wasn't available in the repository. @dweezil, I had been trying the 3225 patch for a couple of days. The last time I tried was on the 27th. Now that I'm looking more closely, I'm seeing this:
The system has 8 cores, so the CPU number's won't add up to 100. I'm running the 3.17.8 kernel currently because of the .zfs directory issue that popped up in 3.18. |
I never let it run this long, but it looks like after 15 hours the txg_sync finished up and the rsync is continuing from where it appeared to be frozen at. |
Letting the server continue to run, it appears there's a pattern of "run, txg_sync for 15 hours, finish running". |
Ok, I'm running with the latest git copy after 3225 and 466 had been committed, and I'm not having the long txg_sycn. However, I'm now seeing that arc_adapt is running continuously at 3.5% cpu since my backups ran. |
echoing to drop_caches stopped the arc_adapt, but it's strange that it wasn't able to clear up whatever memory it was trying to clean up in over 17 hours with nothing else running. |
@angstymeat Could you please grab some stacks from the arc adapt thread with |
So far, these have been easily repeatable on my system. If that holds up it should do it again when it runs Sunday morning. Is there another way to get the stack trace from all processes? This machine is remote-located, and while I have remote access to it I don't think the iDRAC software will let me send a sysrq key combo to it. Would that just be dumping /proc/*/stack? |
You could just do You can also dump each of /proc/*/stack but a bunch of other useful information is also dumped when using the SysRq-t. |
And it did not repeat this morning. I'll see how it does over the next couple of nights. |
Ok, I got it act up again. It looks like I can do this consistently when running a scrub while my rsyncs are running. I've uploaded two dumps:
It is currently still running in this state with arc_adapt spinning at 3.3% CPU. |
I think I am hitting this exact same issue. I am getting this kernel stack trace repeatadly while doing a large copy into ZFS with 8 concurrent writers (I have 16 GiB of memory, 200 - 300 GiB referenced in the ZFS data set, and ~50 GiB deduplicated):
|
The command
|
@kpande I think that dedup in my case is causing ARC memory pressure faster, but I'm not sure. I had no idea dedup was so unreliable that it basically shouldn't be used. 10-15 GiB of a working set should be nothing...I am surprised that you experience unresponsive lockup (for how long?). My lockups tend to go for ~20-30 minutes, not 10's of hours. |
@kpande did you try using the drop_caches hack that @dweeezil mentions? I actually don't need this dedup for production, which is good. I was just going to use ZFS for some experiments on a dataset (file-level vs block-level dedup). Trying this to see if it alleviates the problem: while true
> do
> sleep 30m
> echo 3 > /proc/sys/vm/drop_caches
> done |
Ah, reboot :-) I have contemplated that, but I can't easily quit my current "workload" and resume it unfortunately. As I said, I have 8 writers that are a little over 50% done writing things into ZFS and a reboot means restarting them too 😢. |
For me, |
@kpande if you can put together a test case I'm happy to add it. For the moment you could add it to the existing |
I'm running 0.6.4-1 and I'm back to txg_sync being stuck along with z_null_iss & txg_quiesce. There was no scrub running during this, just straight rsyncs. |
@angstymeat can you post backtraces for the stuck txg_sync, z_null_iss, and txg_quiesce threads. I took a look at the previous backtraces you added and it looked like txg_sync was waiting on scrub IOs. By chance have to changed either the |
It looks like it unstuck itself after 10 hours, early this morning. I'm running another backup to see if I can get it to stick again. The only variable that I'm modifying is zfs_arc_meta_prune which is currently set at 100000. I must have forgotten to comment it out when I was done testing out some of the patches. |
Well, I didn't get txg_sync stuck, but I have arc_adapt spinning again. Here's arcstats:
slabtop:
and meminfo:
I've done another process dump which is available at https://cloud.passcal.nmt.edu/index.php/s/z7ayrqMeJLblvYS. |
@angstymeat it appears you're now seeing the ARC spinning because it's completely consumed by dnodes which can't be dropped for some reason. This is different from the original issue, but clearly still a problem. Two questions, 1) is this an rsync workload for small files? 2) what kernel version are you using? |
Yes, I'm thinking this is two separate issues. I had been having problems with arc_adapt which went away when I think #3225 was committed, but it looks like it's back. Yes, this is an rsync workload with small files, syncing several email servers. I'm running Fedora 20 with the 3.17.8-200.fc20.x86_64. I'm having to keep it and some other systems on 3.17.8 because of the .zfs/snapshot automount issue #3030. |
I tried again today just after the patch in #3283 was committed, and I was left with arc_adapt spinning at 3.3% cpu again for a few hours after the backup was finished I before I stopped it. Should I break this out into a separate issue? |
I think that's a good idea. Can you open a new issue clearly describing the remaining issue and posting the relevant stats. |
@angstymeat I didn't expect the patch for #3283 was going to be sufficient for all of these types of workloads. This issue seems like it may have suffered from a bit of creep so I think @behlendorf's suggestion of a new issue report is a good one. It would be nice to see the following when the various processes are spinning on your system:
It sounds like there may still be lingering issues with the typical "rsync server" application. To that end, I will set up a test rig with current master code and see what happens when I feed it with a bunch of rsyncs to stress its memory (which I'll artificially constrain). |
I'll start a new one and post this information when it happens again. |
Ok, I've moved the arc_adapt issue to #3303. |
I have this issue now as well.
Worth mentioning is also that I do use dmcrypt on the block devices, in case it makes any difference. Up until then it has worked good, I have no problems with "normal" IO (moving and opening files). |
@hallabro opening up a new issue might be better, this one's closed |
@kernelOfTruth it sounds very much like the bug the original post is describing but I will open a new issue instead. |
I seem to be able to pretty reliably cause txg_sync to become stuck during an rsync backup.
I had been having ARC issues with high memory consumption, and bad IO speeds after the ARC fills, so I had been trying the latest git nightlies when they are released, and combinations of various patches like 3223,3225, 3216, and a few others.
While the latest nightlies have helped memory usage, nothing has particularly helped with the IO speed. However, when I don't have memory/speed issues I will eventually end up with txg_sync taking up 80%+ CPU and a stalled rsync process.
This last time, I'm just using the straight-up nightly without any additional patches. The system was stuck in txg_sync for 14 hours before I rebooted.
The backups represent a wide range of file types and sizes. The problems really start to happen when backing up our maildir directories with tons of small files. The ARC fills up and once it is full, performance drops like a stone. I've had to run the echo/drop_caches trick at 20 minute intervals to keep the speed up. Without it, my backups take up to 8 hours to run. With it, it's less than an hour.
I wish I could be more specific, but the speed issues started around the kmem rework. I wasn't seeing this issue under the release 0.6.3.
I'm running Fedora 20 with the 3.17.8-200.fc20.x86_64 kernel. The machine is a Dell R515 with 16GB of RAM. The machine is used for backups, so I've been using it to test git ZFS releases under heavy IO loads.
The text was updated successfully, but these errors were encountered: