-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Libtorrent 2.x memory-mapped files and RAM usage #6667
Comments
Seems like for Windows users it also can cause crashes qbittorrent/qBittorrent#16048. Because in Windows process can't allocate more than virutal memory avaliable (physical RAM + max pagefile size). |
there's no way to avoid Is the confusion among users similar to this? |
pages backed by memory mapped files are not also backed by the pagefile. The page file backs anonymous memory (i.e. normally allocated memory). The issue of windows prioritizing pages backed by memory mapped files very high, failing to flush it when it's running low on memory is known, and there are some work-arounds for that. such as periodically flushing views of files and closing files (forcing a flush to disk).
I don't believe that's true. Do you have a citation for that? |
Yes. It's like
No. I mentioned, that it's not cache.
Well, just search for it. Simply Windows never had overcommit feature. And I personally as programmer faced this fact. |
sorry, I accidentally clicked "edit" instead of "quote reply". And now I'm having a hard time finding the undo button. |
The output from:
would be more helpful.
None of these are microsoft sources, just random people on the internet making claims. One of those claims is a program that (supposedy) demonstrates over-committing on windows.
I think this may be drifting away a bit from your point. Let me ask you this. On a computer that has 16 GB of physical RAM, would you expect it to be possible to memory map a file that's 32 GB? According to one answer on the stack overflow link, it wouldn't be considered over-committing as long as there is enough space in the page file (and presumably in any other file backing the pages, for non anonymous ones). So, mapping a file on disk (according to that definition) wouldn't be over-committing. With that definition of over-committing, whether windows does it or not isn't relevant, so long as it allows more virtual address space than there is physical memory. |
Of course. (File name is redacted.)
Sorry. By overcommit I mean exceeding the virtual memory amount, as I said earlier physical RAM + pagefile. Windows don't allow that. If this happens, Windows will expand the pagefile with empty space to ensure full commit capacity and will raise OOM if max pagefile size exceeded. So if you have e.g. 16G RAM and 16G max pagefile size, maximum possible virtual memory amount for the whole system will be 32G. This can be easily tested. |
Ok. When memory mapping a file, that file itself is backing the pages, so (presumably) they won't affect the pagefile, or be relevant for purposes of extending the pagefile. That I would also expect those pages to be some of the first to be evicted (especially some of the 99.3 % of the non-dirty pages, that are cheap to evict). Do you experience this not happening? Does it slow down the system as a whole? |
I found a more clear analogy. Windows always behaves like Linux with kernel option
This was a download. It was just easier to make a fast showcase for a such large memory scale. Because when downloading the memory consumption seems to grow indefinitely. When seeding, memory consumption depends on peers activity. Only certain file parts are loaded, as shown in this output (file names omitted):
When seeding RSS are freeing when file are not in use. But qBt memory consumption is still very large. With intensive seeding tasks it easily grows to gigabytes. |
I performed better test downloading very large file that is larger than my RAM.
RSS caps at RAM amount avaliable. No system slowdown, OOM events or such. Don't have the swap though. |
My impression is that Linux is one of the most sophisticated kernels when it comes to managing the page cache. Windows is having more problems where the system won't flush dirty pages early enough, requiring libtorrent to force flushing. But that leads to waves of disk thread lock-ups while flushing. There's some work to do on windows still, to get this to work well (unless persistent memory and the unification of RAM and SSD happens before then :) ) |
Yeah. Linux at least can be tweaked in all aspects and debugged. I made some more research in per-process stats.
Mapped files obviously represented as |
This is an issue on Linux, as libtorrent's memory-mapped files implementation affects file eviction (swaping out) functionality. Please watch the video where rechecking the torrent in qBittorrent forces the kernel to swap out 3 GB of RAM. qbittorrent-440-recheck2-2022-01-16_20.18.30.mp4Regular Relevant issue in qBittorrent repo: qbittorrent/qBittorrent#16146 |
I also found this behavior strange. It shouldn't work this way I think. But diving in the source, not found anything suspicious. Line 580 in 55111d2
And flags are normal Line 326 in 55111d2
I also tried to play around with the flags, but nothing changes. But I'm not an expert in C++ and Linux programming, so definitely can miss something. |
Ugh.
|
|
It seems I was not entirely correct in my previous comment. For some reason, my Linux system also swaps out anonymous memory upon reading huge files with the regular means (read/fread), and decreasing vm.swappiness does not help much. This might be recent change as I don't remember such behavior in the past. So take my comment with a grain of salt: everything might work fine, I need to do more tests. |
@ValdikSS, try to adjust |
Well, this is how the OS treats memory-mapped files. This question should be addressed to Linux kernel devs, I suppose. |
For the mmap issue, how about creating many smaller maps instead of the large map of the entire file. And let the size of mapped chunks pool configurable, or default to a reasonable value. It's similar to the original |
I don't think it's obvious that it would make a difference. It would make it support 32 bit systems for sure, since you could stay within the 3 GB virtual address space. But whether unmapping a range of a file causes those pages to be forcefully flushed or evicted, would have to be demonstrated first. If that would work, simply unmapping and remapping the file regularly might help, just like the periodic file close functionality. |
I believe the default behavior is to flush pages or delayed flush on sync, unless special flags were used such as MADV_DONTNEED. But you can always use msync(2) to force the flush. |
Just my 2 cents qbittorrent 4.4.0 started using libtorrent 2.0. Windows 10 x64 Based on this https://libtorrent.org/upgrade_to_2.0-ref.html#cache-size i feel something is not quite right since it says that libtorrent 2.0 should exclusively use the OS cache. |
libtorrent 2 features memory mapped file. So it request the OS to virtually maps the file to the memory and let the OS to decide when to do actual read, write and release through this cpu cache - physical memory - disk stack. OS will report a high memory usage but most of these usage should actually just be cached binaries that don't need to be freed at the moment. (Unless under some scenario windows didn't flush its cache early enough, which is what #6522 and this issue is talking about). But I do think I observed disk usage and IO get higher than before when I first use a libtorrent 2 build. Don't sure if it's still the case for windows now as I've migrated to linux. |
The idea to set Whatever the threshold value, I think this parameter should be easily modifiable to control the behavior depending on different situations. |
Btw, I remind you that |
looking closer at this code, I think the Actually implementing |
Here is a build of qBittorrent with the new patch. |
Yeah, zfs user here and I had to
to avoid the huge read amplification, and even with this prefetch disabled, the read amp could still be 2x-10x with lt1.x. I feel lt is 1) reading more data than it can send, 2) discard them, 3) read again and hit the arc cache. |
The thing is, this is not LT reading the data. LT actually reads in very small chunks of 16K. Which is ineffective in a completely opposite way. Read-ahead is the kernel's feature, made to compensate performance for small random reads. But yes, there is no guarantee that the data would be actually used by the client. The algorithm is "dumb" and very large values could lead to overprovisioning. After #6667 (comment) I actually tweaked my Btrfs to use 128K instead, as this is base kernel default. |
Ok, my change #6667 (comment) made it into official qBittorrent 5.0.1 release. Linux users still need not yet released libtorrent patches for optimal performance though. Or here is qBittorrent 5.0.1 image bundled with the latest |
May be you have 4M recordzise? With lt1.2 and zfs recordsize 128k i'm seeding 3Gbit/s with reading ~450Mb/s (prefetch enabled). |
I've been interested in a fix to the throughput performance and RAM usage of libtorrent 2.0 for a long time, and when I saw the news on qBittorrent's site I've eagerly jumped at the opportunity to test this potential fix. It's good to note that I'm using Windows 11 with external 5400 RPM USB 3.0 HDDs and I'm quite satisfied with libtorrent's 1.2 performance (using 1 GiB of libtorrent's 1.2 own cache and downloading in sequential order), generally maximizing my internet bandwidth with consistent 60-90 MB/s downloads. I've tried libtorrent 2.0 with the "Simple pread/pwrite" option and it certainly offers better performance than before (and it doesn't crash anymore while eating my whole RAM), still not as consistent as libtorrent 1.2, meaning that my speed fluctuates quite a lot with an average of not more than 40 MB/s. I've also noticed some increased read activity during download. Though I might not be able to help the development, still wanted to share my experience. |
Btw, there is another bug I found recently, which potentially harms performance: #7778. The build above does include the fix though. And now when I'm thinking about it, I probably should provide artifacts for all platforms as this bug affects everyone.
Changes are included in official 5.0.2 builds. |
Hi, I've been using the simple pread/pwrite feature with a hetzner smb box for a few weeks now and I've seen drastic improvements in my speeds, however my upload speed still peaks around 900Kbps with some peaks up to 1.2Mbps. That speed seems to be divided down to the individual torrents because when I decrease the amount of torrents uploading, the speed can increase for torrents in demand but it always totals to the ~1Mbps limit. When a torrent is moved to the local machine it easily breaks above my seen limit. With this, I can say that it's not the demand for the torrent but something with running over smb. Is this something that could be related to this issue or I would have to look elsewhere? Thanks! |
After previously using qBittorent 5.0.2 with lt12 on Windows 11 23H2 with no issues, two days ago I decided to try the lt20 version of qBittorent 5.0.3. I have about 1150 torrents, about 100 of them active at a time. Total size for those torrents is about 16 TB, around 8TB on an SSD (torrents currently downloading) and another 8TB on a SMB HDD RAID NAS (completed and seeding torrents). I have 64GB of RAM, a 13900KF CPU. After updating to lt20, yesterday morning I found qBittorent almost completely stuck, its UI barely responding, and download and upload speed close to zero. I was almost tempted to end the process, as it was taking about one minute to respond to a click in the UI. I was able to exit it gracefully in the end, but it took about 10-15 minutes for it to shutdown properly. I restarted it, and it seemed to work properly again after that. Today I caught it again in a similar state. Network activity almost non-existent, and the UI quite sluggish, but not as bad as yesterday. Shutdown speed seemed normal this time. Now I switched back to the lt12 version, but if there is anything you want me to test I can install the lt20 version again, as unfortunately I didn't collect any stats at the time I experienced these issues. I only have this screenshot showing the network speeds going down in qBittorent: |
Did you make sure you were using the bypass method? |
I didn't adjust any settings, as I wasn't aware of these lt20 issues. |
lt20 has a mmap bypass using "Simple pread/pwrite" |
OK, I'll give that a try. |
I had a lot of inconsistent performances and couldn't find the culprit between my fuse based filesystem, qbittorrent lt12/lt20, mmap/simple pread/pwrite, a few weeks ago. Sometime I was able reach >800 MiB/s (~6.5Gbps) of seeding, single peer connection, but most of the time ~40MiB/s. After a good break, I tried again and found that https://github.com/ikatson/rqbit was giving consistent high performance, in upload and download (~800 MiB/s), even on my fuse based fs, with qbittorrent lt12 and lt20, mmap or simple pread... everything was just working well and as expected. 0 inconsistency. I found that rqbit is TCP only. My current qbt configuration was TCP+uTP. When I tested with multiple qbt as clients and one qbt as seeder, they sometimes used a TCP but most of the time it was uTP. When I switched the Peer connection protocol option to TCP only, bingo, consistent very high performance all the time. A single qbt can seed more than 1 500 MiB/s (>12Gbps), using only about ~2 CPU from an AMD 5950x with TCP only. @HanabishiRecca @arvidn would you know why uTP provide such poor performance compare to TCP? I know TCP has a lot of optimization and my system has plenty of ram with large buffer but the x20 difference in performance puzzle me... Regarding the average read size, it's always consistent even at high speed: I don't know why the read size using simple pread is stuck at 16 kiB, probably because the read-ahead isn't triggered for some reason yet to be found. The write size are way more closer, with lt12 being the best one by ~30%. |
@USBhost Your suggestion worked, I had 0 issues since then. Thanks. |
uTP always was slow, afaik. I simply don't use it. |
uTP is much more sensitive to small delays (if the I/O is jittery) but probably mostly it's caused by the balancing of TCP and uTP bandwidth. You could also try to disable the mixed_mode_algorithm to prefer TCP. |
Unfortunately, it has nothing to do with network conditions at all. According to my testing, libtorrent's uTP implementation barely manages to reach like 20 MiB/s even at localhost (2 clients on the same machine connected via 127.0.0.1 loopback). In contrast, TCP performance is orders of magnitude higher. I managed to exceed 2 GiB/s with it (which actually overflows the transfer rate counters, as reported in #7693). |
On my setup I can reach about 30-40MiB/s per connection with uTP and 2 clients on the same machine. The source file served from the page cache (no disk io involved then). There is definitely something in the uTP implementation that throttle it hard compare to the TCP version. Achieving >2GiB/s is really impressive though. Any particular parameters/tuning to reach this level? Since using your watermark recommendation (4096KB, low 128KB, factor 100), the performance are night and day, yet 2GiB/s doesn't seem achievable with a single connection. Am I wrong? |
No, just a DDR5 tmpfs ramdisk. |
Unfortunately, I spoke too soon. While it was quite stable with under 100 active downloads with the "Simple pread/pwrite" setting selected, as soon as I went over 100 active downloads it started to behave poorly, the speed periodically dropping to 0 for minutes, from time to time. The only good thing is that it recovered by itself, each time after a few minutes instead of getting stuck as before. But still, the UI was very sluggish, even opening the options took a lot of time, as did the dialog that opens when adding a torrent. I switched back to the libtorrent 1.2.19.0 and all the problems went away. This how it looked on 2.x with "Simple pread/pwrite" enabled: Notice the periodic speed drops to 0. The chart shows about 16 hours of activity. I tried enabling queuing, to see if it helps, but it made things worse. And this is how it looking now on 1.2. Even after doubling the number of torrents it has been rock stable for the last 24 hours: As you see, no more speed drops to 0. |
libtorrent version (or branch): 2.0.5 (from Arch Linux official repo)
platform/architecture: Arch Linux x86-64, kernel ver 5.16
compiler and compiler version: gcc 11.1.0
Since
qBittorrent
started to actively migrate tolibtorrent
2.x, there a lot of concerns from users about extensive RAM usage.I not tested
libtorrent
with other frontends, but think result will be similar to qBt.The thing is,
libtorrent
2.x memory-mapped files model may be great improvement for I/O performance, but users are confused about RAM usage. And seems like this is not related to particular platform, both Windows and Linux users are affected.libtorrent
2.x causes strange memory monitoring behavior. For me the process RAM usage is reported very high, but in fact memory is not consumed and overall system RAM usage reported is low.Not critical in my case, but kinda confusing.
This counter does not include OS filesystem cache, just in case.
Also this is not some write sync/flush issue, because also present when only seeding.
I'm not an expert in this topic, but maybe there are some flags can be tweaked for
mmap
to avoid this?The text was updated successfully, but these errors were encountered: