Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck in IOWAIT state on multiple CPUs #3668

Closed
angstymeat opened this issue Aug 7, 2015 · 15 comments
Closed

Stuck in IOWAIT state on multiple CPUs #3668

angstymeat opened this issue Aug 7, 2015 · 15 comments
Milestone

Comments

@angstymeat
Copy link

This is to replace issue #3654 that I messed up pretty badly with some confused ramblings because my zswap stats weren't being calculated correctly.

Summary:
I've seen this happen on two systems, but I an going to be reporting on the system that is easier for me to work on...

After the system gets low on memory after running for a while, my ARC drains and I eventually get to a state where the ZFS filesystems become inaccessible. This eventually causes the whole system to slow down to the point where it is essentially frozen. While I can still log on, I see CPUs stuck in the IOWAIT state according to htop.

This system is a Dell R515 server with 32GB of memory, Fedora 22 running the 4.0.8-300.fc22.x86_64 kernel, SPL 0.6.4-18_g8ac6ffe, and ZFS 0.6.4-184_g6bec435.

At first I thought this was #3637, but that was supposed to be fixed in the commit I am using.

The operation I'm running is a ZFS send of a 6.2TB filesystem to a compressed stream on an NFS mount. The command I'm running is zfs send -R storage@backup | mbuffer | pigz > /ext/storage.zfs. The mbuffer is there for me to watch the transfer rate and the totals.

I started this from a fresh reboot. For the first 20 minutes, ARC usage increased until it was full. Memory usage spiked to 99% of my 32GB (16GB being used by the ARC). Shortly thereafter, the ARC drained to almost nothing, staying at around 34MB.

It was at this point that my swap partition started seeing regular usage, growing to 14MB where it currently is. It's not dramatic, but the system does have 32GB of RAM, and it is doing nothing else other than running the zfs send I mentioned earlier. You would think it would have enough memory to do this without needing to swap.

After running for about seven hours, I started running into my problem state again. While the system is not totally unresponsive right now, I have two CPUs stuck in IOWAIT accompanied by sporadic, slow access to the ZFS mounts. The system did appear to be froze for between 5 to 10 minutes, but then recovered to where it is now.

I'm not sure what is using my memory right now. According to htop I have 2.5GB in use, with the rest being listed as cache.

perf top -ag is showing:

+   18.01%     0.09%  [kernel]                  [k] arch_cpu_idle
+   16.23%     0.20%  [kernel]                  [k] apic_timer_interrupt
+   15.42%     0.12%  [kernel]                  [k] smp_apic_timer_interrupt
+    9.02%     0.02%  perf                      [.] hist_entry_iter__add
+    7.88%     0.01%  [kernel]                  [k] irq_exit
+    6.70%     0.12%  [kernel]                  [k] __do_softirq
+    6.40%     0.68%  perf                      [.] hists__output_resort
+    5.80%     0.00%  [kernel]                  [k] proc_reg_read
+    5.73%     0.10%  [kernel]                  [k] seq_read
+    5.29%     0.88%  perf                      [.] hists__collapse_resort
+    5.13%     0.04%  [kernel]                  [k] worker_thread
+    4.93%     0.27%  [kernel]                  [k] hrtimer_interrupt
+    4.91%     0.00%  libc-2.21.so              [.] __GI___libc_read
+    4.82%     2.19%  [kernel]                  [k] arc_kmem_reap_now

I've uploaded some debugging information to https://cloud.passcal.nmt.edu/index.php/s/0f4Axgf72Of4oOa.

Just for some additional data, I was able to crash another machine last week that was at high memory utilization by running a scrub at the same time an rsync backup was running.

@angstymeat
Copy link
Author

My send has frozen again after 9 hours. The system hasn't locked, but there is a good bit of latency when accessing the ZFS mounts. One CPU is again at 100% iowait.

This is with nothing else except the send running.

This is also with zfs_arc_min set to 1G.

@angstymeat
Copy link
Author

I know this doesn't particularly help narrow down the issue but...

I was only able to get the system to run up to 9 hours before, but now I'm trying out DeHackEd's ADB + master branch (https://github.com/DeHackEd/zfs/commits/dehacked-bleedingedge2), and I've been copying for over 24 hours without an issue.

My issue also sounds a bit like #3680 and #3676. In #3676 it was mentioned that you would see that happen if you couldn't write to the destination, but I have no issue writing or accessing it.

@eolson78
Copy link

@angstymeat any ideas what commits you think are in that repo that are helping you here vs what you were running. I am hitting this exact same issue and would love a fix.

@behlendorf
Copy link
Contributor

@angstymeat @eolson78 it looks like you're having a similar ARC collapse issue as described in #3680. The fix for this proposed by @dweeezil was merged to master this morning. If you could pull the latest masters source and verify it resolves the issue that would be helpful.

@behlendorf behlendorf added this to the 0.6.5 milestone Aug 19, 2015
@angstymeat
Copy link
Author

Thanks, I will give it a try when I can reboot next (might be a day or so). I'm in the process of verifying a 5.2TB send stream in preparation of recreating my pool with new hard drives.

In the meantime, DeHackEd's bleeding-edge2 ABD branch has been working very well.

@angstymeat
Copy link
Author

My system blew-up with txg_sync and txg_quiesce taking up a ton of cpu, and a bunch of frozen rsync processes, so I rebooted and I'm trying out the new commits now.

@angstymeat
Copy link
Author

With the latest master:

I was running for about an hour and then I noticed that zswapd0 and zswapd1 were taking up a lot of CPU (between 80% and 120%) each. I saw that my pigz which usually runs around 700% was down around 35%. Around then I saw that my zfs send was stuck.

vmstat show nothing being swapped in or out.

I rebooted and I'm trying again.

@angstymeat
Copy link
Author

Got the system to slow down to a crawl in about 20 minutes. I was running a zfs send piped into pigz for compression, and I started up my nightly rsync backups at the same time. As soon as memory filled, my send speed dropped from around 50MB/s down to 4MB/s with arc_reclaim running at 99%. I let it run like this for a while and it didn't appear that arc_reclaim was making any headway.

I tried dropping caches and my send speed went back up to around 25MB/s for a while, but when the drop finished `arc_reclaim`` ran continuously for several more minutes while my send dropped back down to a few MB/s.

arc_reclaim finally finished and my send speed went back to normal.

For the record, running my send under the ABD branch ran continuously at around 50MB/s despite the rest of the processes I had running.

...

Around 10 minutes after the above, my memory filled up again. Speed is slow again. kswapd0 and kswapd1 are running a lot.

Running vmstat I see that "swpd" is increasing, but I'm reading 0 for si and so. I'm assuming the space reported in use by swapd is being used by zswap, so it hasn't had to hit the disk yet.

...

Long-story made short, so far it appears that performance under memory pressure seems to suffer a lot more than it does with ABD. The zfs send I'm testing with only takes 30 minutes under the same conditions with the ABD branch, but has now been running for almost an hour and is only 3/4 finished.

I'm going to let everything continue running untouched for the next couple of hours and take a look at it again.

@behlendorf
Copy link
Contributor

@angstymeat did you make sure to update the SPL code as well? That's where the fix was. Also your previous data showed that the ARC size has collapsed. Is that still the case?

@angstymeat
Copy link
Author

Yes, I always recompile SPL along with ZFS even when it hasn't changed. It's currently version 0.6.4-19_g851a549.

I forgot about the arc collapse; I was looking more for iowait state freezes.

Looking at it, I'm not seeing a sudden collapse although my arc is currently just 388MB in size. I'm now trying the same job that I was running when I originally experienced the issue. The arc is currently filling and I will let you know what happens.

@angstymeat
Copy link
Author

Ok, when I posted this issue the arc collapsed in about 20 minutes. I've been running this for about 45 minutes and my arc is still full.

I'm going to let it run overnight since I was able to grind the system to a halt within 9 hours of starting the send in my previous attempts.

@angstymeat
Copy link
Author

Ok, it's been running for over 20 hours without an issue.

@behlendorf
Copy link
Contributor

@angstymeat what's the status of this issue? Has the initial issue been resolved by the latest code? Are there other issues here?

@angstymeat
Copy link
Author

It's been running with this commit just fine. No issues with the arc collapsing.

I don't want to muddy the issue, but it looks like there's a lot more kswapd0 activity, but that's probably because this patch looks to more sources of free memory than before. I don't have any hard numbers, but it seems that performance drops off more when kswapd0 and arc_reclaim are running here than they do when I was running the ABD patches. Although I think that's an apples-to-oranges comparison.

I would say that this issue can be closed, now.

@behlendorf
Copy link
Contributor

@angstymeat yes I think that's to be expected. ABD should enable significant additional improvements when it gets merged after the tag. I'm glad we got this sorted out, closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants