-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DHT feeds #19
Comments
I'd like to revive discussion about this topic as I think it's a critical part for the future of the BitTorrent ecosystem. Having decentralized feeds would allow content providers to more freely share their content, and be less constrained to DNS-based systems. I'm willing to help with implementation if needed. I've already implemented the Skip List idea for a standalone app (https://github.com/lmatteis/peer-tweet) but I think @the8472's implementation works better for torrents - certainly having as little data as possible on the DHT makes more sense. One question I have about this particular implementation is regarding the idea of having the torrent be a download of a multifile torrent. Consider very large feeds (with possibly millions of entries); wouldn't .torrent's add unnecessary bytes? After all, what is needed is the raw metadata and the infohash. This is important because the user isn't going to download all of these torrents, but only the ones he's interested in. A .torrent on the other hand contains "extra info" because the user probably wants to download it. So perhaps a different format can be used that works well for searching (the primary use of the feed). @arvidn I hope we can get this discussion going again and start a BEP about it :) |
For some value of unnecessary and highly dependent on use-case. If you're interested in most or all entries of the feed you would have to fetch that data anyway. If you're only interested in a tiny fraction of a million-torrent feed then maybe you're not the intended target of the feed itself, i.e. it's meant for aggregators which provide a nicer query interface. And at some point in the future (#29) merkle torrents could be used to cut it down further. Also note that selective downloading is an option and the idea is to separate the current/head part of the feed from larger archive, which should also reduce overhead. |
Search itself is not the primary use. Supplying a stream of new content is. The subscriber could be a search engine, or simply a user subscribed to all the content of a particular feed. |
I think this is a reasonable approach. I have a few comments:
|
To keep the file list that has to be transferred in the most common case small I intend to allow the feed to paginate if needed by having an "older" pointer in the torrent itself. as mentioned earlier merkle trees can help with the hashes. Piece alignment is certainly something that would be aid deduplication, but there's no BEP that I can refer to. hint hint, nudge nudge |
solved with #40 |
Many moons ago we discussed an approach to implement RSS feeds over the DHT. http://libtorrent.org/dht_rss.html and I guess BEP44 spawned from that, but 44 is barely put to use and feeds have never been specified on top of that.
crossposting from LT-discuss:
We already have all the parts that could be used to implement decentralized torrent feeds manually.
All that's needed is some glue to tie it together in a way that more than one client could understand.
I think mutable, signed torrents and push-notifications/gossiping within a swarm have been mentioned in some related thread, but those would be new features essential to get a feed working.
As arvid mentioned the original idea was to have skip lists with a mutable head in the DHT pointing to infohashes.
But I'm a little doubtful that this would scale well. It would probably work but it seems a little brittle or could suffer from inefficiencies.
On the other hand the objection to using the metadata exchange is that it is heavyweight in comparison to a DHT lookup.
I think using the metadata exchange option is preferable for several reasons
a) new entries can be batched if frequent republishing would cause too much churn
b) the size of the most frequently accessed part of the feed (the newest entries) can be kept constant by growing a separate archive torrent that is accessed/needs to be refreshed less frequently
c) Trusting implementers to get republishing for mutable puts right is already a dicy in my opinion.
Trusting them to get efficient republishing large sets of skip list nodes right at scale seems even riskier
d) potential hiccups/failure modes would be more confined to the set of swarms that make a feed
e) swarms can carry larger sets of data. I.e. a meta-torrent can carry full-fledged torrent that also has entries in the outer dictionary (trackers, comments, creation date, other fluff in the future)
So, my proposal would be as follows
a) regular torrents as feed items
b) various pointers (name + infohash) to related feed torrents (archive of older torrents, subfeeds, whatever). could abuse the filelist with zero-length files for that or put hashes into text files
To make sharing of feeds easy we can introduce a new magnet. e.g. magnet:?xt=urn:btfeed:[...]
The rest would be implementation guidance/best practices to keep overhead low and the contents human-friendly.
All that implementers need for basic support is BEP44 + some UI/utilities for publishers or subscribers.
Standalone DHT-to-RSS adapters would also be possible.
I'm willing to implement a prototype + write BEP draft if there is some inclination/interest by other devs to adopt.
The text was updated successfully, but these errors were encountered: