-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gossiping torrent popularity to scale to millions of torrents #4256
Comments
First initial results of a DHT lookup experiment to establish the amount of traffic it takes to do torrent lookups. In this small experiment, I do multiple lookups of random torrents in an academic dataset using the torrent health endpoint of Tribler (nothing else is running). After every lookup, I query how much bytes the DHT in libtorrent did process (both incoming and outgoing) using the As we see, the average number of bytes to do a DHT lookup increases. In addition, we require around 8.5KB of non-DHT traffic per lookup (as reported by the In total, the session processed around 80MB of incoming DHT traffic and 120MB of outgoing DHT traffic. Full libtorrent DHT statistics after 3638 lookups:
Net stats:
Regarding CPU usage during these lookups, it differs between 5% and 10% and is relatively stable. |
Tomorrow I will plot all possible statistics regarding DHT traffic and try to figure out why the traffic increases. A possible solution is to reset the DHT session after a specific amount of lookups. |
Findings so far:
|
Update: I managed to get the peers from the DHT. Now we should find a way to classify them as seeder or leecher. |
@qstokkink thanks for the links! |
I performed 1300 BEP33 lookups with the modified libtorrent code (see #4321 for details). Some results: Message sent and received Here we plot the statistics on messages being sent and received in the DHT, as reported by the libtorrent session. Note that there are many outgoing Bandwidth usage Note how BEP33 does not require any libtorrent session bandwidth, outside for the DHT traffic. Since we are not actually joining the swarm, libtorrent does not have to maintain connections to other peers, only to DHT nodes. The bandwidth uses increases roughly linear and there seems to be no additional bandwidth overhead after each lookup. After 1300 lookups, each lookup requires 13266 bytes (13.0KB) of incoming traffic and 10731 bytes (10.5KB) of outgoing traffic. In total, each lookup requires 23997 bytes (23.4KB). Most of the traffic is probably related to the bloom filters being sent around. These bloom filters are 256 bytes and there are two of them in each incoming DHT statistics We now plot several statistics of the DHT, like the number of nodes in our routing table and the torrents tracked. The number of peers tracked remains around 100. The number of nodes in our routing table quickly increases after startup to around 400. We do no track mutable/immutable data items. Dead torrents Of the 1300 lookups, 823 torrents did not have any seeders and they are considered dead (61.9%). Conclusions I think we should implement BEP33 lookups in Tribler. The tradeoffs are as follows: Advantages:
Disadvantages:
|
BEP33 for the win! |
With #4434 and Tribler/gumby#409, I consider the second iteration of the A new validation experiment has been added to verify the correctness of the popularity community. I plot the total number of torrents that have health information (based on the I now think we should observe how this mechanism behaves in the wild. |
I consider this issue appropriately addressed for the 7.3 release. As said before, we should observe how it works in the wild. Therefore I move this issue to 7.4 so we can re-visit and improve our torrent selection algorithm if necessary. |
Since this issue is more leaning towards a full research project, I'm unassigning myself. My recommendations for the next steps:
Highly related to #3868 |
Would like to make this feature a key issue of a next release. Such as 7.6 "Bug Fix" and then: 7.7 "search&popularity". |
The functionality of the OP has been implemented for several years now. Closing. |
A Tribler Giga channel can scale to billions of torrents (#3971) and is shared in the network, however the health of the torrent (being a dynamic value) is not included in the channel and is not propagated. This leads to having access to large number of torrents but without information if it is alive or not. So, we need a mechanism to share the health of the torrents in the Giga channel so popular content gets to the surface so users can browse through them.
To find out the popularity of the torrents in the Giga channel and to dissiminate the popular torrents in the network, we start with a simple approach based on random selections.
Some key points:
After the first deployment, we will know how well the random approach works and the optimizations can follow.
The text was updated successfully, but these errors were encountered: