Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libtorrent 2.x memory-mapped files and RAM usage #6667

Open
HanabishiRecca opened this issue Jan 13, 2022 · 168 comments
Open

Libtorrent 2.x memory-mapped files and RAM usage #6667

HanabishiRecca opened this issue Jan 13, 2022 · 168 comments

Comments

@HanabishiRecca
Copy link
Contributor

libtorrent version (or branch): 2.0.5 (from Arch Linux official repo)

platform/architecture: Arch Linux x86-64, kernel ver 5.16

compiler and compiler version: gcc 11.1.0

Since qBittorrent started to actively migrate to libtorrent 2.x, there a lot of concerns from users about extensive RAM usage.
I not tested libtorrent with other frontends, but think result will be similar to qBt.

The thing is, libtorrent 2.x memory-mapped files model may be great improvement for I/O performance, but users are confused about RAM usage. And seems like this is not related to particular platform, both Windows and Linux users are affected.
libtorrent 2.x causes strange memory monitoring behavior. For me the process RAM usage is reported very high, but in fact memory is not consumed and overall system RAM usage reported is low.

qb-lt2

Not critical in my case, but kinda confusing.
This counter does not include OS filesystem cache, just in case.
Also this is not some write sync/flush issue, because also present when only seeding.

I'm not an expert in this topic, but maybe there are some flags can be tweaked for mmap to avoid this?

@HanabishiRecca
Copy link
Contributor Author

Seems like for Windows users it also can cause crashes qbittorrent/qBittorrent#16048. Because in Windows process can't allocate more than virutal memory avaliable (physical RAM + max pagefile size).
And also it can cause lags, because if pagefile is dynamic, the system will expand it with empty space. Windows doesn't have overcommit feature, so it must ensure that allocated virtual memory is actually exist somewhere.

@arvidn
Copy link
Owner

arvidn commented Jan 13, 2022

there's no way to avoid mmap allocating virtual address space. However, the relevant metric is resident memory (which is the actual amount of physical RAM used by a process. in htop these metrics are reported as VIRT and RES respectively. I don't know what PSS is, do you? it sounds like it may measure something similar to virtual address space.

Is the confusion among users similar to this?

@arvidn
Copy link
Owner

arvidn commented Jan 13, 2022

And also it can cause lags, because if pagefile is dynamic, the system will expand it with empty space.

pages backed by memory mapped files are not also backed by the pagefile. The page file backs anonymous memory (i.e. normally allocated memory). The issue of windows prioritizing pages backed by memory mapped files very high, failing to flush it when it's running low on memory is known, and there are some work-arounds for that. such as periodically flushing views of files and closing files (forcing a flush to disk).

Windows doesn't have overcommit feature, so it must ensure that allocated virtual memory is actually exist somewhere.

I don't believe that's true. Do you have a citation for that?

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented Jan 13, 2022

I don't know what PSS is, do you? it sounds like it may measure something similar to virtual address space.

Yes. It's like RES but more precise. Don't matter anyway, RES value are effectively identical in this case. Can take a > screenshot with all possible values enabled, if you don't belive.

Is the confusion among users similar to this?

No. I mentioned, that it's not cache.

I don't believe that's true. Do you have a citation for that?

Well, just search for it.
https://superuser.com/questions/1194263/will-microsoft-windows-10-overcommit-memory
https://www.reddit.com/r/learnpython/comments/fqzb4h/how_to_allow_overcommit_on_windows_10/

Simply Windows never had overcommit feature. And I personally as programmer faced this fact.

@arvidn
Copy link
Owner

arvidn commented Jan 13, 2022

sorry, I accidentally clicked "edit" instead of "quote reply". And now I'm having a hard time finding the undo button.

@HanabishiRecca
Copy link
Contributor Author

More columns. VIRT is way larger.

qb-st

@arvidn
Copy link
Owner

arvidn commented Jan 13, 2022

I don't know what PSS is, do you? it sounds like it may measure something similar to virtual address space.

Yes. It's like RES but more precise. Don't matter anyway, RES value are effectively identical in this case. Can take a > screenshot with all possible values enabled, if you don't belive.

The output from:

pmap -x <pid>

would be more helpful.

I don't believe that's true. Do you have a citation for that?

Well, just search for it.
https://superuser.com/questions/1194263/will-microsoft-windows-10-overcommit-memory
https://www.reddit.com/r/learnpython/comments/fqzb4h/how_to_allow_overcommit_on_windows_10/

None of these are microsoft sources, just random people on the internet making claims. One of those claims is a program that (supposedy) demonstrates over-committing on windows.

Simply Windows never had overcommit feature. And I personally as programmer faced this fact.

I think this may be drifting away a bit from your point. Let me ask you this. On a computer that has 16 GB of physical RAM, would you expect it to be possible to memory map a file that's 32 GB?

According to one answer on the stack overflow link, it wouldn't be considered over-committing as long as there is enough space in the page file (and presumably in any other file backing the pages, for non anonymous ones). So, mapping a file on disk (according to that definition) wouldn't be over-committing. With that definition of over-committing, whether windows does it or not isn't relevant, so long as it allows more virtual address space than there is physical memory.

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented Jan 13, 2022

The output from:

pmap -x <pid>

would be more helpful.

Of course. (File name is redacted.)

10597:   /usr/bin/qbittorrent
Address           Kbytes     RSS   Dirty Mode  Mapping
000055763ecbd000     512       0       0 r---- qbittorrent
000055763ed3d000    3508    2316       0 r-x-- qbittorrent
000055763f0aa000    5984     368       0 r---- qbittorrent
000055763f682000     108     108     108 r---- qbittorrent
000055763f69d000      28      28      28 rw--- qbittorrent
000055763f6a4000       8       8       8 rw---   [ anon ]
000055764131e000   54236   54048   54048 rw---   [ anon ]
00007ef4cc899000 9402888 8062068   56592 rw-s- file-name-here
...

None of these are microsoft sources, just random people on the internet making claims. One of those claims is a program that (supposedy) demonstrates over-committing on windows.
I think this may be drifting away a bit from your point. Let me ask you this. On a computer that has 16 GB of physical RAM, would you expect it to be possible to memory map a file that's 32 GB?

According to one answer on the stack overflow link, it wouldn't be considered over-committing as long as there is enough space in the page file (and presumably in any other file backing the pages, for non anonymous ones). So, mapping a file on disk (according to that definition) wouldn't be over-committing. With that definition of over-committing, whether windows does it or not isn't relevant, so long as it allows more virtual address space than there is physical memory.

Sorry. By overcommit I mean exceeding the virtual memory amount, as I said earlier physical RAM + pagefile. Windows don't allow that. If this happens, Windows will expand the pagefile with empty space to ensure full commit capacity and will raise OOM if max pagefile size exceeded. So if you have e.g. 16G RAM and 16G max pagefile size, maximum possible virtual memory amount for the whole system will be 32G. This can be easily tested.
Linux allows overcommit, by that I mean allocate more VIRT than RAM+swap size. Btw on the screenshot I have 16G RAM and no swap at all. VIRT size of 41G will not be possible on Windows in such case.

@arvidn
Copy link
Owner

arvidn commented Jan 13, 2022

Ok. When memory mapping a file, that file itself is backing the pages, so (presumably) they won't affect the pagefile, or be relevant for purposes of extending the pagefile.

That pmap output seems reasonable to me, and what I would expect. Except if you're seeding, then I would expect the "file-name-here" region to be mapped read-only.

I would also expect those pages to be some of the first to be evicted (especially some of the 99.3 % of the non-dirty pages, that are cheap to evict). Do you experience this not happening? Does it slow down the system as a whole?

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented Jan 13, 2022

I found a more clear analogy. Windows always behaves like Linux with kernel option vm.overcommit_memory = 2. That's it.
Sorry again for previous word spam.

That pmap output seems reasonable to me, and what I would expect. Except if you're seeding, then I would expect the "file-name-here" region to be mapped read-only.

I would also expect those pages to be some of the first to be evicted (especially some of the 99.3 % of the non-dirty pages, that are cheap to evict). Do you experience this not happening? Does it slow down the system as a whole?

This was a download. It was just easier to make a fast showcase for a such large memory scale. Because when downloading the memory consumption seems to grow indefinitely.

When seeding, memory consumption depends on peers activity. Only certain file parts are loaded, as shown in this output (file names omitted):

Address           Kbytes     RSS   Dirty Mode  Mapping
...
00007f0f3ca19000  949204   96008       0 r--s- 
00007f0f7690e000  220484    5440       0 r--s- 
00007f0f8405f000 1046808   15980       0 r--s- 
00007f1003853000 1048576    1472       0 r--s- 
00007f1043ea5000  995120   67236       0 r--s- 
00007f1080a71000 1048576    2604       0 r--s- 
00007f10c0a71000 1048576   53036       0 r--s- 
00007f1100a71000 1048576   27404       0 r--s- 
00007f1140a71000  888884   53708       0 r--s- 
00007f1176e7e000 1048576   59892       0 r--s- 
00007f11b6e7e000 1048576   41908       0 r--s- 
00007f11f6e7e000 1178632  153136       0 r--s- 
00007f123ed80000  797004    4668       0 r--s- 
00007f126f7d3000 1173584  142572       0 r--s- 
00007f12f6b50000    6748    4800       0 r--s- 
00007f12f71e7000 1045132  101892       0 r--s- 
00007f1336e8a000 1176632  140892       0 r--s- 
00007f137eb98000 1048576   55532       0 r--s- 
00007f13beb98000 1146904    5504       0 r--s- 
00007f1404b9e000 1178168  191456       0 r--s- 
00007f144ca2c000  886596   47408       0 r--s- 
00007f1482bfd000  836900   34880       0 r--s- 
00007f14b5d46000 1163336  149824       0 r--s- 
00007f14fcd58000 1165984  224952       0 r--s- 
...

When seeding RSS are freeing when file are not in use. But qBt memory consumption is still very large. With intensive seeding tasks it easily grows to gigabytes.
With libtorrent 1.x qBt consumes like 100M overall (with in-app disk cache disabled), regardless of anything.

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented Jan 13, 2022

Do you experience this not happening? Does it slow down the system as a whole?

I performed better test downloading very large file that is larger than my RAM.

free output:

               total        used        free      shared     buffers       cache   available
Mem:            15Gi       1.7Gi       183Mi       104Mi       0.0Ki        13Gi        13Gi
Swap:             0B          0B          0B

pmap output:

Address           Kbytes     RSS   Dirty Mode  Mapping
00007f085b17e000 34933680 13305724   37664 rw-s- filename

RSS caps at RAM amount avaliable. No system slowdown, OOM events or such. Don't have the swap though.
So good news, it works just like regular disk cache (belongs to cache column in free output), at least in Linux. Dirty amount is small as expected. The only scary thing is that it shows as resident memory in per-process stats.

@arvidn
Copy link
Owner

arvidn commented Jan 13, 2022

My impression is that Linux is one of the most sophisticated kernels when it comes to managing the page cache. Windows is having more problems where the system won't flush dirty pages early enough, requiring libtorrent to force flushing. But that leads to waves of disk thread lock-ups while flushing. There's some work to do on windows still, to get this to work well (unless persistent memory and the unification of RAM and SSD happens before then :) )

@HanabishiRecca
Copy link
Contributor Author

Yeah. Linux at least can be tweaked in all aspects and debugged.

I made some more research in per-process stats.

RssAnon:	  123820 kB
RssFile:	 5087496 kB
RssShmem:	    9276 kB

Mapped files obviously represented as RssFile and monitoring tools like htop seem to simply sum up all 3 values.
Can't say is this just monitoring software issue (should RssFile even be included?) or the situation is more complicated.

@ValdikSS
Copy link

This is an issue on Linux, as libtorrent's memory-mapped files implementation affects file eviction (swaping out) functionality.

Please watch the video where rechecking the torrent in qBittorrent forces the kernel to swap out 3 GB of RAM.

qbittorrent-440-recheck2-2022-01-16_20.18.30.mp4

Regular mmap()'ed files never trigger this behavior: they never counted as RSS and does not force to swap out the memory of other processes.

Relevant issue in qBittorrent repo: qbittorrent/qBittorrent#16146

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented Jan 16, 2022

I also found this behavior strange. It shouldn't work this way I think.

But diving in the source, not found anything suspicious. mmap seems to be used in a usual way

, m_mapping(m_size > 0 ? mmap(nullptr, static_cast<std::size_t>(m_size)

And flags are normal
MAP_FILE | MAP_SHARED

I also tried to play around with the flags, but nothing changes.

But I'm not an expert in C++ and Linux programming, so definitely can miss something.

@ValdikSS
Copy link

Ugh.

  1. mmap()'ed files are indeed shown in RSS on Linux as you read them
  2. When physical memory becomes low, the kernel will unmap sections of the file from physical memory based on its LRU (least recently used) algorithm. But the LRU is also global. The LRU may also force other processes to swap pages to disk, and reduce the disk cache. This can have a severely negative affect on the performance on other processes and the system as a whole.[1] — this is what I see on my system with vm.swappiness = 80
  3. It is possible to hint the kernel that you don't need some parts of the memory mapping with madvise MADV_DONTNEED without unmapping the file, but you'll need to implement your own LRU algo for this to be efficient in libtorrent case. You can't 'set' the mapped memory to reclaim itself more automatically, you'll need to manually call madvise on selected regions you use less than other.
  4. Mmaped files make memory consumption monitoring problematic, at least on Linux, which was also spotted by golang developers and jemalloc library developers which is used in Firefox:

"I looked at the task manager and Firefox sux!"

  1. Unless torrenting is the machine higher priority task, using mmap (at least on linux) for torrent data only decreases overall performance as it affects other workloads by swapping more than it could/should and "spoiling" LRU with less-priority data.

@arvidn
Copy link
Owner

arvidn commented Jan 16, 2022

MADV_DONTNEED will destroy the contents of dirty pages, so I don't think that's an option, but there's MADV_COLD in newer versions of linux.

@ValdikSS
Copy link

It seems I was not entirely correct in my previous comment. For some reason, my Linux system also swaps out anonymous memory upon reading huge files with the regular means (read/fread), and decreasing vm.swappiness does not help much. This might be recent change as I don't remember such behavior in the past. So take my comment with a grain of salt: everything might work fine, I need to do more tests.

@HanabishiRecca
Copy link
Contributor Author

@ValdikSS, try to adjust vm.vfs_cache_pressure.

@karabaja4
Copy link

karabaja4 commented Jan 26, 2022

I am very confused about what htop reports as memory usage for qbittorrent with libtorrent 2.x.

2022-01-26_18-53

Can someone explain these values?

@HanabishiRecca
Copy link
Contributor Author

Can someone explain these values?

Well, this is how the OS treats memory-mapped files. This question should be addressed to Linux kernel devs, I suppose.

@mayli
Copy link
Contributor

mayli commented Feb 3, 2022

For the mmap issue, how about creating many smaller maps instead of the large map of the entire file. And let the size of mapped chunks pool configurable, or default to a reasonable value. It's similar to the original cache_size, but it would reduce some confusions from most users that rely on tools that mix mmap-ed pages vs actual memory usage.

@arvidn
Copy link
Owner

arvidn commented Feb 9, 2022

For the mmap issue, how about creating many smaller maps instead of the large map of the entire file

I don't think it's obvious that it would make a difference. It would make it support 32 bit systems for sure, since you could stay within the 3 GB virtual address space. But whether unmapping a range of a file causes those pages to be forcefully flushed or evicted, would have to be demonstrated first.

If that would work, simply unmapping and remapping the file regularly might help, just like the periodic file close functionality.

@mayli
Copy link
Contributor

mayli commented Feb 9, 2022

But whether unmapping a range of a file causes those pages to be forcefully flushed or evicted, would have to be demonstrated first.

I believe the default behavior is to flush pages or delayed flush on sync, unless special flags were used such as MADV_DONTNEED.

But you can always use msync(2) to force the flush.

@SL-Gundam
Copy link

SL-Gundam commented Feb 15, 2022

Just my 2 cents
I'm using qbittorrent which uses libtorrent on windows.
In earlier versions of qbittorrent i set cache size to 0 (disabled) and turned OS cache on.
This worked perfectly where disk usage was the lowest it could ever be (5-20 %) with very little memory usage (160-200 MiB) for the actual qbittorrent process. Modified cache stayed relatively low aswell.

qbittorrent 4.4.0 started using libtorrent 2.0.
With qbittorrent 4.4.0 the cache size setting had disappeared.
When i added a couple of torrents the disk usage was around 40-50% and the memory shot up to 12-14 GiB for the qbittorrent process for similar torrents as described above.

Windows 10 x64
Hardware RAID 5 using 6 HDD drives
32 GiB of ram

Based on this https://libtorrent.org/upgrade_to_2.0-ref.html#cache-size i feel something is not quite right since it says that libtorrent 2.0 should exclusively use the OS cache.
If i understand corrently libtorrent 2.0 wanted to make my previous situation standard and unchangeable. But it does not behave the same. Or am i misunderstanding something?

@escape0707
Copy link

escape0707 commented Feb 15, 2022

@SL-Gundam

libtorrent 2 features memory mapped file. So it request the OS to virtually maps the file to the memory and let the OS to decide when to do actual read, write and release through this cpu cache - physical memory - disk stack.

OS will report a high memory usage but most of these usage should actually just be cached binaries that don't need to be freed at the moment. (Unless under some scenario windows didn't flush its cache early enough, which is what #6522 and this issue is talking about).

But I do think I observed disk usage and IO get higher than before when I first use a libtorrent 2 build. Don't sure if it's still the case for windows now as I've migrated to linux.

@ghz-max
Copy link

ghz-max commented Oct 20, 2024

The idea to set POSIX_FADV_RANDOM on small pieces size make a lot of sense to avoid wasting precious IO, however the threshold should be evaluated... I would expect that the sweet spot should be around 64KiB, but quite hard to properly asses it.

Whatever the threshold value, I think this parameter should be easily modifiable to control the behavior depending on different situations.

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented Oct 20, 2024

Btw, I remind you that POSIX_FADV_RANDOM does not affect memory-mapped files at all. They simply ignore the flag and read ahead full throttle anyway.

@arvidn
Copy link
Owner

arvidn commented Oct 20, 2024

looking closer at this code, I think the FADV_RANDOM is a bit problematic. The cross-platform flag libtorrent uses is called random_access, but otherwise it's used to determine whether to hint for sequential access.
Since this is an internal flag, I think renaming it (and inverting its meaning) to sequential_access would make a lot more sense. Then the FADV_RANDOM is naturally replaced by FADV_SEQUENTIAL (used when checking files).

Actually implementing FADV_RANDOM for small pieces would require a new flag and more new code.

@arvidn
Copy link
Owner

arvidn commented Oct 20, 2024

#7758

@HanabishiRecca
Copy link
Contributor Author

Here is a build of qBittorrent with the new patch.
https://github.com/HanabishiRecca/qBittorrent/actions/runs/11430669007#artifacts

@LazyPajen
Copy link

I don't now if it helps or not:
Win 11 24H2 (26100)
qBittorrent 5.0.0
Qt: 6.7.3
Libtorrent: 1.2.19.0
Boost: 1.86.0
OpenSSL: 3.3.2
zlib: 1.3.1
Images are from
Process Lasso Pro v15.0.2.18 x64

I've got a huge memory use with Libtorrent 2.0.11 in MMP-mode:
image
after some reading i tested with POSIX-mode
image
and the memory-load went down to almost nothing in compare with MMP.
Side Note: Disk speed and Seeding speed went down subjectively a noticeable differences.
And now I am using 1.2.19.0 with those results
image
None of the other parameters in qB have changed
For Posix and LT1.2* the memory use are well according to Windows taskmanager
In MMP-mode its a Huge diffrence

@mayli
Copy link
Contributor

mayli commented Oct 23, 2024

I'm not sure how with a read-ahead of 128, you can have 1M request size?

Turns out it is FS-dependent. Block device read-ahead is one thing, but filesystems are free to have their own values.

E.g. on Btrfs I have

$ </sys/class/bdi/btrfs-1/read_ahead_kb
4096

So up to 4M, and it is the default value.

Yeah, zfs user here and I had to

echo 1 >/sys/module/zfs/parameters/zfs_prefetch_disable

to avoid the huge read amplification, and even with this prefetch disabled, the read amp could still be 2x-10x with lt1.x.
qb is constantly reading ~200MB/s from fs, but transmitting at less than 50MB/s.

image
image
image

I feel lt is 1) reading more data than it can send, 2) discard them, 3) read again and hit the arc cache.

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented Oct 23, 2024

I feel lt is 1) reading more data than it can send, 2) discard them

The thing is, this is not LT reading the data. LT actually reads in very small chunks of 16K. Which is ineffective in a completely opposite way.

Read-ahead is the kernel's feature, made to compensate performance for small random reads. But yes, there is no guarantee that the data would be actually used by the client. The algorithm is "dumb" and very large values could lead to overprovisioning.

After #6667 (comment) I actually tweaked my Btrfs to use 128K instead, as this is base kernel default.
It looks optimal: still much less strain on the disk (128K is 8 times less IOPS than 16K), but no noticable overprovisioning happening.

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented Oct 29, 2024

Ok, my change #6667 (comment) made it into official qBittorrent 5.0.1 release.


Linux users still need not yet released libtorrent patches for optimal performance though.
But now rebuilding libtorrent only is enough.

Or here is qBittorrent 5.0.1 image bundled with the latest RC_2_0 libtorrent build.
https://github.com/HanabishiRecca/qBittorrent/actions/runs/11576625075#artifacts

@Aleksman4o
Copy link

Aleksman4o commented Oct 30, 2024

the read amp could still be 2x-10x with lt1.x.
qb is constantly reading ~200MB/s from fs, but transmitting at less than 50MB/s.

May be you have 4M recordzise? With lt1.2 and zfs recordsize 128k i'm seeding 3Gbit/s with reading ~450Mb/s (prefetch enabled).
On zfs you can't read smaller block than recordsize, so you asking disk about 16k but you get full recordzise (4M)

@ValentinDragomir
Copy link

I've been interested in a fix to the throughput performance and RAM usage of libtorrent 2.0 for a long time, and when I saw the news on qBittorrent's site I've eagerly jumped at the opportunity to test this potential fix.

It's good to note that I'm using Windows 11 with external 5400 RPM USB 3.0 HDDs and I'm quite satisfied with libtorrent's 1.2 performance (using 1 GiB of libtorrent's 1.2 own cache and downloading in sequential order), generally maximizing my internet bandwidth with consistent 60-90 MB/s downloads.

I've tried libtorrent 2.0 with the "Simple pread/pwrite" option and it certainly offers better performance than before (and it doesn't crash anymore while eating my whole RAM), still not as consistent as libtorrent 1.2, meaning that my speed fluctuates quite a lot with an average of not more than 40 MB/s. I've also noticed some increased read activity during download.

Though I might not be able to help the development, still wanted to share my experience.

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented Oct 30, 2024

Btw, there is another bug I found recently, which potentially harms performance: #7778.
It is already fixed, but it was present in 2.0 branch since the very beginning. And regular users need to wait until it gets into a release.

The build above does include the fix though. And now when I'm thinking about it, I probably should provide artifacts for all platforms as this bug affects everyone.
It could be especially bad on Windows if you have an antivirus running. Because antiviruses tend to scan files every time they are opened and/or closed.

qBittorrent 5.0.1 + libtorrent RC_2_0 (7f69124)
Linux | Windows | macOS

Changes are included in official 5.0.2 builds.

@markgyoni
Copy link

Hi, I've been using the simple pread/pwrite feature with a hetzner smb box for a few weeks now and I've seen drastic improvements in my speeds, however my upload speed still peaks around 900Kbps with some peaks up to 1.2Mbps. That speed seems to be divided down to the individual torrents because when I decrease the amount of torrents uploading, the speed can increase for torrents in demand but it always totals to the ~1Mbps limit.

When a torrent is moved to the local machine it easily breaks above my seen limit. With this, I can say that it's not the demand for the torrent but something with running over smb.

Is this something that could be related to this issue or I would have to look elsewhere?

Thanks!

@vnicolici
Copy link

After previously using qBittorent 5.0.2 with lt12 on Windows 11 23H2 with no issues, two days ago I decided to try the lt20 version of qBittorent 5.0.3. I have about 1150 torrents, about 100 of them active at a time. Total size for those torrents is about 16 TB, around 8TB on an SSD (torrents currently downloading) and another 8TB on a SMB HDD RAID NAS (completed and seeding torrents). I have 64GB of RAM, a 13900KF CPU.

After updating to lt20, yesterday morning I found qBittorent almost completely stuck, its UI barely responding, and download and upload speed close to zero. I was almost tempted to end the process, as it was taking about one minute to respond to a click in the UI. I was able to exit it gracefully in the end, but it took about 10-15 minutes for it to shutdown properly. I restarted it, and it seemed to work properly again after that.

Today I caught it again in a similar state. Network activity almost non-existent, and the UI quite sluggish, but not as bad as yesterday. Shutdown speed seemed normal this time.

Now I switched back to the lt12 version, but if there is anything you want me to test I can install the lt20 version again, as unfortunately I didn't collect any stats at the time I experienced these issues. I only have this screenshot showing the network speeds going down in qBittorent:

image

@USBhost
Copy link

USBhost commented Dec 19, 2024

Did you make sure you were using the bypass method?

@vnicolici
Copy link

I didn't adjust any settings, as I wasn't aware of these lt20 issues.

@USBhost
Copy link

USBhost commented Dec 19, 2024

lt20 has a mmap bypass using "Simple pread/pwrite"
Please retry with that option enabled.

@vnicolici
Copy link

OK, I'll give that a try.

@ghz-max
Copy link

ghz-max commented Dec 29, 2024

I had a lot of inconsistent performances and couldn't find the culprit between my fuse based filesystem, qbittorrent lt12/lt20, mmap/simple pread/pwrite, a few weeks ago.

Sometime I was able reach >800 MiB/s (~6.5Gbps) of seeding, single peer connection, but most of the time ~40MiB/s. After a good break, I tried again and found that https://github.com/ikatson/rqbit was giving consistent high performance, in upload and download (~800 MiB/s), even on my fuse based fs, with qbittorrent lt12 and lt20, mmap or simple pread... everything was just working well and as expected. 0 inconsistency.

I found that rqbit is TCP only. My current qbt configuration was TCP+uTP. When I tested with multiple qbt as clients and one qbt as seeder, they sometimes used a TCP but most of the time it was uTP.

When I switched the Peer connection protocol option to TCP only, bingo, consistent very high performance all the time. A single qbt can seed more than 1 500 MiB/s (>12Gbps), using only about ~2 CPU from an AMD 5950x with TCP only.

@HanabishiRecca @arvidn would you know why uTP provide such poor performance compare to TCP? I know TCP has a lot of optimization and my system has plenty of ram with large buffer but the x20 difference in performance puzzle me...

Regarding the average read size, it's always consistent even at high speed:
lt12 : 128KiB (limited by my current fuse environment from the kernel and libfuse3)
lt20 (mmap): ranging from 40 to 60 KiB
lt20 (simple pread): 16 kiB

I don't know why the read size using simple pread is stuck at 16 kiB, probably because the read-ahead isn't triggered for some reason yet to be found.

The write size are way more closer, with lt12 being the best one by ~30%.

@vnicolici
Copy link

@USBhost Your suggestion worked, I had 0 issues since then. Thanks.

@HanabishiRecca
Copy link
Contributor Author

would you know why uTP provide such poor performance compare to TCP?

uTP always was slow, afaik. I simply don't use it.

@arvidn
Copy link
Owner

arvidn commented Dec 30, 2024

uTP is much more sensitive to small delays (if the I/O is jittery) but probably mostly it's caused by the balancing of TCP and uTP bandwidth. You could also try to disable the mixed_mode_algorithm to prefer TCP.

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented Dec 30, 2024

but probably mostly it's caused by the balancing of TCP and uTP bandwidth. You could also try to disable the mixed_mode_algorithm to prefer TCP.

Unfortunately, it has nothing to do with network conditions at all. According to my testing, libtorrent's uTP implementation barely manages to reach like 20 MiB/s even at localhost (2 clients on the same machine connected via 127.0.0.1 loopback).

In contrast, TCP performance is orders of magnitude higher. I managed to exceed 2 GiB/s with it (which actually overflows the transfer rate counters, as reported in #7693).

@ghz-max
Copy link

ghz-max commented Dec 30, 2024

On my setup I can reach about 30-40MiB/s per connection with uTP and 2 clients on the same machine. The source file served from the page cache (no disk io involved then). There is definitely something in the uTP implementation that throttle it hard compare to the TCP version.

Achieving >2GiB/s is really impressive though. Any particular parameters/tuning to reach this level? Since using your watermark recommendation (4096KB, low 128KB, factor 100), the performance are night and day, yet 2GiB/s doesn't seem achievable with a single connection. Am I wrong?

@HanabishiRecca
Copy link
Contributor Author

Any particular parameters/tuning to reach this level?

No, just a DDR5 tmpfs ramdisk.

@vnicolici
Copy link

vnicolici commented Jan 2, 2025

@USBhost Your suggestion worked, I had 0 issues since then. Thanks.

Unfortunately, I spoke too soon. While it was quite stable with under 100 active downloads with the "Simple pread/pwrite" setting selected, as soon as I went over 100 active downloads it started to behave poorly, the speed periodically dropping to 0 for minutes, from time to time. The only good thing is that it recovered by itself, each time after a few minutes instead of getting stuck as before. But still, the UI was very sluggish, even opening the options took a lot of time, as did the dialog that opens when adding a torrent.

I switched back to the libtorrent 1.2.19.0 and all the problems went away. This how it looked on 2.x with "Simple pread/pwrite" enabled:

image
image

Notice the periodic speed drops to 0. The chart shows about 16 hours of activity. I tried enabling queuing, to see if it helps, but it made things worse.

And this is how it looking now on 1.2. Even after doubling the number of torrents it has been rock stable for the last 24 hours:

image
image

As you see, no more speed drops to 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests