-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RC_2_0: Write cache doesn't flush to disk #6522
Comments
Maybe it is the same problem which I see on Windows 10 64Bit. There seems to be a problem with memory mapped files and microsofts own anti virus software. |
nope, OS caching do not increase RAM usage of programs post missing steps to reproduce. |
I use avast.
In any case, this option doesn't exist in RC_2_0 |
stopping the torrent will cause libtorrent to close the files (and file maps), which it sounds like will also trigger windows to flush the dirty pages. This seems to be a recurring problem on windows. In the past, forcefully closing files periodically I think has proven the most reliable solution. It would be really nice if there was something like |
It sounds like a lame limitation of the os cache manager to not flush an open file no matter the size/age of its dirty pages.
According to this you can also call FlushFileBuffers() to force a flush. |
And if libtorrent is doing memory maps then maybe FlushViewOfFile() will help. And the remarks indicate that this is an async call too. |
Did you try to emulate it (e.g. by force terminating qBittorrent process)? |
The data resides in OS cache (not under qbt process), you'll need to terminate the OS instead. |
qbt was killed immediately. But doing a right-click->Properties on the file didn't show the dialog for a few seconds (at least 20secs). I also opened resource monitor during that time. The top disk activity for writing was that file. So I suppose the OS was flushing it after the kill. I'll try to simulate a power loss with a forced VM poweroff. |
It seems like the most appropriate fix is to schedule a call to FlushViewOfFile() periodically |
Line 232 in 3623664
Don't know if related, does the opened file supposed to be shared with other processes? or does libtorrent opens multiple handles of the same file? I would expected it to be just |
I think it would be nice to perform this action also when completing the download of some parts of the file. |
actually, every time a file completes, its file handle is closed. I believe this will flush it to disk. This is primarily to make sure the next time it's accessed the file is opened in read-only mode. |
Therefore, the problem mostly affects really large files which require more time to complete and more memory to map. |
nope |
I tried it. Total data loss. |
Another related problem may be that the resume data does not correspond to the actual data on the disk. I.e., some parts may be marked as completed in the resume data, but they may be lost due to incorrect system shutdown. This looks more serious than the opposite problem, when some part of the data is written to a file, but is not marked in the resume data as completed. |
But in reality, the problem of incorrect shutdown of the system cannot be reliably solved, besides, we should not consider it as a regular scenario. I think we should focus on the performance issue. Data should be flushed to disk periodically to prevent extreme I/O when the system needs to use the occupied memory for other needs. But the time-based periodicity looks inefficient for me, because at different download speeds it will give different results. Wouldn't it be better to have a periodicity based on the amount of currently downloaded data (since last flushing) so that the size of unflushed data does not exceed some reasonable amount? |
From the
My reading is that this is similar to an i.e. There will be no back-pressure from this function if the disk is at capacity. The back-pressure will happen in the page faults allocating new dirty pages by the I/O threads. |
From my experience, even the handle is closed the data might still be floating around in OS cache waiting to be written. To really ensure data is on disk better flush it explicitly (before closing handle): https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-flushviewoffile
|
It would be even better if the watermark for flushing is user tune-able, and maybe have a default |
Here's a start of a fix: #6529 |
Is this bug not just an instance of the known cache performance problems on older Windows versions? In any case I strongly object to libtorrent having to work around bad filesystem cache behavior. This is what OSes are for. Cache management is a terribly complex area, and evolves as storage evolves. libtorrent's codebase is already huge and shouldn't be burdened with workarounds like this. |
(IMO, exposing |
I don't think this is just an issue in old versions of windows. as far as I know, all versions of windows have issues balancing disk cache against, for example, executable TEXT segments that may currently be executing, or a cache of zeroed pages to hand to processes that need more RAM. basically, windows seems very eager to prioritize disk cache (at least dirty pages) over other uses. To some extent this used to be an issue on linux too, where it could under estimate the time it would take to flush a dirty page to its backing store, and not start flushing them early enough. Another issue with windows and memory mapped files, specifically, is that it seems to be a bit of an after thought. IIUC, the disk cache on windows is not the same as the page cache, but is actually caching at the file level (not block-device level), which means memory mapped files are also different than pages in the page cache, which I imagine could make it interact poorly with prioritizing page-cache pages against disk cache pages. Anyway, libtorrent has to take a pragmatic approach. It doesn't matter whose responsibility a problem is, it needs to be solved either way. |
@arvidn What about exposing I think those functions would also be useful to help consistency of resume data in the power loss case. I'm writing a separate issue about that right now. FWIW, in researching that topic I found you need |
There is this call already, does that do what you need? http://libtorrent.org/reference-Torrent_Handle.html#flush-cache |
It looks like I'm considering whether I would want a I can see a lot of upsides to |
yes, I agree that libtorrent probably doesn't implement the documented behavior currently. |
The first one just flushes the dirtiest file, which might leave other active files in memory for longer. Also another issue with the windows cache: If you start rechecking a big torrent and it will fill up all the RAM. The system becomes laggy. Windows starts auto-flushing the files when RAM usage is at its limits but you still experience system lag. |
This solution is not good! Error still exists! see Qbitorrent 4.4.0 |
the "error" refers to the cache not being flushed? or to the pauses caused by flushing whole files at a time? My impression is that the first problem has been traded for the second |
@arvidn |
@diizzyy The title of this ticket is "Write cache doesn't flush to disk", it that's not the main issue, what is? |
If you have a fast enough connection you'll still see memory exhaustion no? |
I don't believe so, but please share if you do. |
@djtj85 do you experience the cache not flushing early enough also? Do you have any more symptoms to contribute? which operating system are you on? |
I'm planning a few patches to attempt to address this issue. I just landed an improvement to the benchmarks, where it now also measures memory usage (but only on linux so far). #6679 This addresses the memory priority when checking files on unix systems: #6681 This ticket is specifically about windows and I have two more patches planned.
|
@djtj85 I take that as a "yes". Can you describe the symptoms? |
I'm seeing this behavior on my Server 2019 box. Historically, I've downloaded torrents directly to my NAS via SMB over a 10Gbps network using qBittorrent. Ryzen 1700x, 32GB RAM. I'm on a 1Gbps fiber WAN link, but limit my download speed to 60000 KiB/s to limit the issues below, but sometimes it still happens. I've observed behavior in the past (pre-qBit 4.4.0) where I've seen qBittorrent downloading data from WAN, but seeing little or no network traffic over the NIC dedicated to SMB traffic (while queued IO jobs number climbs higher and higher), eventually qBit will flush pieces to the NAS in bursts. Pre-4.4.0, RAM usage by Qbitorrent wouldn't balloon to use all available RAM. I've been able to manage this by adjusting the speed limits for downloading to about 60MB/s. Here's what I see with 4.3.9 and downloading a large torrent. With qbit 4.4.0, I've observed the above occurs, in addition to seeing SIGNIFICANTLY more received (reading from NAS disk) traffic over the SMB 10Gbit nic while downloading. Which doesn't make sense to me. Why is qBit reading more data from disk than it's writing? Further, qBit 4.4.0, uses all available RAM, and as seen before, writes very little to disk on the NAS, while being heavy on reads. And, now that I go to create a screenshot, I can't get qbit 4.4.0 to not crash on start up... |
we are still waiting for improvement! Do you already understand the problem? |
please give this patch a try: #6703 |
so, you can open a chrome tab and start other apps, but the system is "uncertain"? |
I see RAM usage climbing upto 99% but with stable download speed. When 99% is reached, system starts to lag and even typing becomes difficult. Then after a few seconds, I think windows flushes the writes to disk and RAM usage goes down to 30-35%. During the flushing window download speed plummets to few MiB/s. After the flush RAM usage starts climbing again with stable speeds. With latest commit a45ead2 RAM usage climbs to 99% and stays there. System does not become unresponsive like before. |
I think my download speed was way higher than libtorrent could validate the pieces and write them to disk. Because even when my download was complete 100%, the torrent was showing as stalled in qBittorrent. And I could see there were still 1000+ pieces left to be validated. Once they were validated the torrent was complete. So I think this method doesn't serve well for high speed users like me! |
I tested again. This time on a SSD. It was able to sustain speed even at 99% RAM usage. So I think this issue only affects high speed downloaders using spinning disks. |
Please provide the following information
libtorrent version (or branch): RC_2_0 f4d4528
platform/architecture: Windows 7 x64 sp1
compiler and compiler version: msvc2017
please describe what symptom you see, what you would expect to see instead and
how to reproduce it.
To better observe this problem you need a torrent with one big file (eg 10-16 GB) and a fairly fast connection (eg 100Mbps).
My system has 16GB RAM. I doesn't matter if I enable OS cache or not, the downloaded data seem to reside in RAM for far too long.
While the file downloads I observe that both the
Working Set
of qbittorrent and the system RAM usage go constantly up. I assume this is due to the OS caching. However it doesn't seem to flush to disk in regular intervals. Minutes have passed, GB of data have been downloaded, but the flushing hasn't happened.Let's assume I have a file manager windows open (explorer.exe) and I navigate to the file. No matter how many times I open the file properties its
size on disk
doesn't change.There are 2 ways I have coerced it to flush to disk:
Open Containing Folder
or double click on file to launch the associated media player. These actions basically call a shell API to do the work. But somehow also make Windows finally flush to disk.From the little documentation online about the Windows file cache it seems that every second it would commit 1/8 of the cached data to disk. But it doesn't happen with RC_2_0.
This can have serious effects on end users:
Furthermore, I also tested against latest RC_1_2. This doesn't happen there. It also doesn't matter if I enable OS cache or not there. I know that the file i/o subsystem has changed fundamentally between RC_1_2 and RC_2_0 but I write about it in case it matters. Also I have set cache_expiry to 60 seconds and cache size to 65MiB. AFAIK this options don't exist in RC_2_0.
PS: To demonstrate the importance of the problem. I observed this while I had something downloading in the background and I was doing "office work" (browsing, pdf opening, word writing etc) which is simple in terms of disk demand and ram demand. Yet suddenly the system was freezing up randomly. I opened task manager and my 16GB RAM had almost filled up. I saw that the disk activity was up. It took at least 20 minutes for things to be usable again.
The text was updated successfully, but these errors were encountered: