Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHT feeds #19

Closed
the8472 opened this issue Oct 8, 2015 · 6 comments
Closed

DHT feeds #19

the8472 opened this issue Oct 8, 2015 · 6 comments

Comments

@the8472
Copy link
Contributor

the8472 commented Oct 8, 2015

Many moons ago we discussed an approach to implement RSS feeds over the DHT. http://libtorrent.org/dht_rss.html and I guess BEP44 spawned from that, but 44 is barely put to use and feeds have never been specified on top of that.

crossposting from LT-discuss:


We already have all the parts that could be used to implement decentralized torrent feeds manually.

All that's needed is some glue to tie it together in a way that more than one client could understand.

I think mutable, signed torrents and push-notifications/gossiping within a swarm have been mentioned in some related thread, but those would be new features essential to get a feed working.

As arvid mentioned the original idea was to have skip lists with a mutable head in the DHT pointing to infohashes.
But I'm a little doubtful that this would scale well. It would probably work but it seems a little brittle or could suffer from inefficiencies.

On the other hand the objection to using the metadata exchange is that it is heavyweight in comparison to a DHT lookup.

I think using the metadata exchange option is preferable for several reasons

a) new entries can be batched if frequent republishing would cause too much churn
b) the size of the most frequently accessed part of the feed (the newest entries) can be kept constant by growing a separate archive torrent that is accessed/needs to be refreshed less frequently
c) Trusting implementers to get republishing for mutable puts right is already a dicy in my opinion.
Trusting them to get efficient republishing large sets of skip list nodes right at scale seems even riskier
d) potential hiccups/failure modes would be more confined to the set of swarms that make a feed
e) swarms can carry larger sets of data. I.e. a meta-torrent can carry full-fledged torrent that also has entries in the outer dictionary (trackers, comments, creation date, other fluff in the future)


So, my proposal would be as follows

  • feeds start from a BEP 44 mutable value
  • value is an infohash
  • subscribers fetch torrent via metadata exchange
  • the torrent is a multifile torrent containing
    a) regular torrents as feed items
    b) various pointers (name + infohash) to related feed torrents (archive of older torrents, subfeeds, whatever). could abuse the filelist with zero-length files for that or put hashes into text files
  • the torrent is not ordered in filesystem/alphabetical order but by ascending creation date
  • subscribers slow-poll DHT. when the mutable value is changed subscribers switch over to the new swarm, attempting to re-use as much of the old data as possible (de-dup should be fairly easy here)
  • subscribers assist in re-publishing the DHT keys and keep swarm alive, publishers are not required to do active maintenance between updates

To make sharing of feeds easy we can introduce a new magnet. e.g. magnet:?xt=urn:btfeed:[...]

The rest would be implementation guidance/best practices to keep overhead low and the contents human-friendly.

All that implementers need for basic support is BEP44 + some UI/utilities for publishers or subscribers.

Standalone DHT-to-RSS adapters would also be possible.


I'm willing to implement a prototype + write BEP draft if there is some inclination/interest by other devs to adopt.

@lmatteis
Copy link
Contributor

I'd like to revive discussion about this topic as I think it's a critical part for the future of the BitTorrent ecosystem. Having decentralized feeds would allow content providers to more freely share their content, and be less constrained to DNS-based systems.

I'm willing to help with implementation if needed. I've already implemented the Skip List idea for a standalone app (https://github.com/lmatteis/peer-tweet) but I think @the8472's implementation works better for torrents - certainly having as little data as possible on the DHT makes more sense.

One question I have about this particular implementation is regarding the idea of having the torrent be a download of a multifile torrent. Consider very large feeds (with possibly millions of entries); wouldn't .torrent's add unnecessary bytes? After all, what is needed is the raw metadata and the infohash. This is important because the user isn't going to download all of these torrents, but only the ones he's interested in. A .torrent on the other hand contains "extra info" because the user probably wants to download it.

So perhaps a different format can be used that works well for searching (the primary use of the feed).

@arvidn I hope we can get this discussion going again and start a BEP about it :)

@the8472
Copy link
Contributor Author

the8472 commented Jul 23, 2016

wouldn't .torrent's add unnecessary bytes?

For some value of unnecessary and highly dependent on use-case. If you're interested in most or all entries of the feed you would have to fetch that data anyway. If you're only interested in a tiny fraction of a million-torrent feed then maybe you're not the intended target of the feed itself, i.e. it's meant for aggregators which provide a nicer query interface.

And at some point in the future (#29) merkle torrents could be used to cut it down further.

Also note that selective downloading is an option and the idea is to separate the current/head part of the feed from larger archive, which should also reduce overhead.

@the8472
Copy link
Contributor Author

the8472 commented Jul 23, 2016

So perhaps a different format can be used that works well for searching (the primary use of the feed).

Search itself is not the primary use. Supplying a stream of new content is. The subscriber could be a search engine, or simply a user subscribed to all the content of a particular feed.

@arvidn
Copy link
Contributor

arvidn commented Jul 24, 2016

I think this is a reasonable approach. I have a few comments:

  1. If the feed ends up containing a lot of items, the file-list itself could become large. With this proposal, it would be transferred via the metadata transfer extension. This may be less efficient/reliable than transferring it as bittorrent payload. It could be solved by (for example) storing the .torrent files in a tar-file, or something similar. However, the simplicity of plain .torrent files is clearly appealing.
  2. libtorrent currently only supports deduplicating whole files whose piece size and piece alignment are identical. The general recommendation I've been giving people wanting to use the "mutable torrent" feature is to pad large files to make them piece-aligned. However, it can be argued that pad-files is no better or less complex than supporting unaligned deduping (which uTorrent does). On the other hand, presumably .torrent files are relatively small, and only the penultimate file would be redundantly re-downloaded and it might not be a big deal.

@the8472
Copy link
Contributor Author

the8472 commented Jul 24, 2016

To keep the file list that has to be transferred in the most common case small I intend to allow the feed to paginate if needed by having an "older" pointer in the torrent itself. as mentioned earlier merkle trees can help with the hashes.

Piece alignment is certainly something that would be aid deduplication, but there's no BEP that I can refer to. hint hint, nudge nudge

@the8472
Copy link
Contributor Author

the8472 commented Jan 13, 2017

solved with #40

@the8472 the8472 closed this as completed Jan 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants