Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1 TByte of seeding #21

Open
synctext opened this issue Mar 13, 2013 · 20 comments
Open

1 TByte of seeding #21

synctext opened this issue Mar 13, 2013 · 20 comments

Comments

@synctext
Copy link
Member

Ability to seed 1 TByte of content using Tribler.

This requires announcing, say, 1000 swarms of content in the DHT.
This is a problem, as shown here: http://blog.libtorrent.org/2012/01/seeding-a-million-torrents/
The announce interval needs to be prolonged in order to reduce DHT announce traffic.
Perhaps the DHT cannot handle this and a new peer discovery method is required: #13.

As a quick partial fix we can put a cap on the maximum swarms to DHT announce. Then use a simple round-robin method to cycle slowly through all available swarms.

NielsZeilemaker pushed a commit to NielsZeilemaker/tribler that referenced this issue May 7, 2013
@synctext synctext modified the milestones: V9.0: Shadow Internet, V7.1 anonymous seeding test Nov 3, 2014
@synctext
Copy link
Member Author

"Performance analysis of a Tor-like onion routing implementation", Quinten Stokkink, Harmjan Treep, http://arxiv.org/abs/1507.00245

@qstokkink student Wouter is aiming to repeat your work and do a CPU performance analysis automagically after each pull request.

Roadmap :-)
image

@synctext
Copy link
Member Author

@Pathemeous Found interesting code from Quinten: devel...qstokkink:devel

@qstokkink
Copy link
Contributor

If you want to re-use our code you should extract the start_profiling and stop_profiling functions from our code and call them respectively before and after whatever you want to profile (and make sure it's in the filter). The tunnel_piecharts.R script can then parse profiling files in this format. We integrated this into the Gumby pipeline with a script added to the .conf file.

@Pathemeous
Copy link

Thanks @qstokkink, I will have a look.

@whirm
Copy link

whirm commented Nov 10, 2015

We do already have memory profiling and manhole (telnet into the process to inspect it in real time) It would be cool to have the profiling integrated into gumby's instrumentation.py so all the experiments can use it straight away.

@synctext
Copy link
Member Author

synctext commented Oct 7, 2016

A simple 1 week starting experiment without Tribler, Tor-stack, and no Gumby.

Goal is to first detect if there are Libtorrent bottlenecks. Experimental setup consists of just Libttorrent on Ubuntu. Libtorrent with seeding of 50GB..250GByte and testing the local download performance. With 1000 swarms or so, it is expected that Libtorrent grinds to a halt in normal settings.

Outcome: Spend 2 weeks to create graph to show performance development, as you're seeding more GBytes and swarms.

Experiment:

  • seed 10 GByte, 50 swarms
  • download for 1 minute
  • write down downloaded speed MBytes progress
  • Repeat: double amount of GBytes & swarms
  • create scalability graph: seeding size versus download speed
  • tweak until above works :-)
  • document after 2 weeks effort
  • Move to Gumby+Tribler_tor stack..

@Pathemeous
Copy link

@synctext
Copy link
Member Author

synctext commented Oct 17, 2016

Progress: first script to seed and control Libtorrent from Python.
Step closer to a PullRequest tester in Jenkins with 1 TByte seeding test.
Ardhi has 3000+ Linux .iso torrents, seems sufficient for a future test.
20161017_142647

Measure in an easy to build setting the cost of 1 TByte seeding (or MaxHardDiskCapacity). Start Libtorrent for 1 hour with various seeding size settings and measure the total consumed bandwidth. Goal is to identify the overhead. For instance, set the seeding upload bandwidth to just 10 KByte or so. This need to be subtracted to obtain the DHT, PEX, and othe control overhead protocol traffic.

Docs: The limits of the number of downloading and seeding torrents are controlled via active_downloads, active_seeds and active_limit in session_settings change the default of 5 active seed to 10000 :-)

@Pathemeous
Copy link

  • Graph plot of startup time (time since launch until first seeding occurs)
  • Graph (bar) plot of total amount of bandwidth used while seeding for 1 hour.
    Interesting result is the following:
  • Graph plot of the effective amount of bandwith used (total minus overhead of torrent management)

@ardhipoetra
Copy link

Some of the .torrent crawled can be found in my dropbox. Currently it has 3658 .torrents.

I'm crawling mininova now, but I got blocked so maybe it will take some time to get more torrents on this site. AFAIK mininova now hosts legal torrents only.

@devos50
Copy link
Contributor

devos50 commented Oct 18, 2016

@ardhipoetra great, thanks for sharing!

You might want to rate-limit your requests and maybe use HTTP proxies for the crawling process?

@ardhipoetra
Copy link

In the end, I used rate-limit my request and it works.

As @devos50 requested, here is the link to the collection, zipped. You can put that in the bbq.

As for the crawler, I made the repository on https://github.com/ardhipoetra/legal-torrent-crawler

@egbertbouman
Copy link
Member

Currently, Tribler will create an introduction point for every torrent it is seeding. This could potentially overload the exit nodes, especially now that exit nodes are also running the PexCommunity. To minimize this problem it was thinking about somehow limiting the number of introduction points that we create. We could do this from the TriblerTunnelCommunity or maybe using libtorrent's auto-management feature (which allows us to limit the number of active seeds). Using auto-management seems to make the most sense. @qstokkink @devos50 What do you think?

@qstokkink
Copy link
Contributor

Sure. Why not?

@synctext
Copy link
Member Author

synctext commented Apr 1, 2021

@drew2a Just a small reminder. Please setup a seedbox for a Tribler channel with lots of Creative Commons music. (e.g. different then superapp; overlap). Simple static dump. For demo purposes only. Show that we can seed lots of stuff: https://github.com/mdeff/fma

EDIT: then please use that to setup a demo channel with markdown and real content #3615

@drew2a
Copy link
Contributor

drew2a commented Apr 9, 2021

For further development. An idea.

As I see, there are two types of Tribler's users:

  1. Channel's creators (who wants to seed content by creating a channel)
  2. Normal users (who wants to download)

So, what if we developed a tool that makes it easier to create and seed a channel?

Like:

$./create_and_seed.sh <folder>

Where is a folder with the following structure:

my channel
├ sub_directory
| ├ file1
| ├ file2
| └ README.md
├ sub_directory2
| ├ file3
| └ file4
└ README.md

The behavior:

  1. Create a channel, using my channel as a channel name
  2. Create a markdown preview for a folder in case of the presence of *.md-file in this folder.
  3. Start to seed a content

@ichorid what do you think?

@ichorid
Copy link
Contributor

ichorid commented Apr 9, 2021

@ichorid what do you think?

What are the fileX things? Torrents? Or actual files that should become individual torrents?

@drew2a
Copy link
Contributor

drew2a commented Apr 13, 2021

What are the fileX things? Torrents? Or actual files that should become individual torrents?

Actual files (discussed offline).

@drew2a
Copy link
Contributor

drew2a commented Apr 14, 2021

I did an experiment:

  1. Have generated 1GB of data divided into 1024 torrents (generate_test_data.py)
  2. Seed them (seeder.py)
  3. Have downloaded3 different torrents (picked randomly) from another PC. All downloads have been completed within the range of [5..30] seconds.

No trackers were used.
Libtorrent version: 1.2.10

@drew2a drew2a mentioned this issue Apr 26, 2021
@drew2a
Copy link
Contributor

drew2a commented May 20, 2021

FMA test data were seeded for one month (1 channel, 156 torrents, 23 GB total).
The music data are still available inside Tribler.

image

@drew2a drew2a assigned xoriole and ichorid and unassigned drew2a Sep 20, 2021
@qstokkink qstokkink removed this from the Backlog milestone Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

10 participants