Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

🌟 adding Torrent support to IPFS #779

Closed
daviddias opened this issue Mar 8, 2017 · 19 comments
Closed

🌟 adding Torrent support to IPFS #779

daviddias opened this issue Mar 8, 2017 · 19 comments
Assignees
Labels
exp/wizard Extensive knowledge (implications, ramifications) required ipld P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked

Comments

@daviddias
Copy link
Member

I've started working in enabling Torrent support for js-ipfs, very much in the same way that we have support for: dag-pb, dag-cbor, eth-blocks, eth-tx, zcash (go-ipfs only), git (go-ipfs only) and bitcoin (go-ipfs only).

The end goal is to expose two top level commands to add and retrieve files that are Torrents, from the IPFS or BitTorrent network (through a bridge and in the future, by connecting directly). The commands being:

  • jsipfs torrent add
  • jsipfs torrent cat

However, I stumbled upon a question in which we will have to make a decision and I would like to get feedback before going at full speed. In BitTorrent, torrent files are not referenced by a Cryptographic hash due to their ephemeral and mutable nature (in fact, decoding and encoding is not even always idempotent by spec), the only thing that has a cryptographic identifier is the info field in the torrent file.

I started implementing the IPLD format for a Torrent file, but I'm guessing that most people will want to fetch their torrent through the infoHash of the torrent that they get from a thing like a magnetic URI, the crux is that there is never a file for the info field, as soon as a infoHash query is performed, a Torrent file is retried, rising the question of:

Should dag.get(<infoHash>/somePath) resolve through the retrieved Torrent file or only over the info field?

  • Resolve through the Torrent file - This is weird to the IPLD resolver, as it would be resolving an immutable pointer to something that has more fields
  • Resolve only within the info field - This would force us to make the info field a full standalone object that can be transferred independently (the solution I'm leaning towards). This option would result in two multicodecs for Torrents, torrent-file and torrent-info.

Thoughts? //cc @jbenet @whyrusleeping @nicola

@ghost
Copy link

ghost commented Mar 8, 2017

Resolve only within the info field - This would force us to make the info field a full standalone object that can be transferred independently (the solution I'm leaning towards). This option would result in two multicodecs for Torrents, torrent-file and torrent-info.

This sounds like the pragmatic way for me too -- we'll likely get a better idea of what to do with the whole torrent in the process of working on this.

Given that the torrent file itself is not already content-addressed, it's also the "correct" way I think. Magnet URIs address the info hash anyway.

@ghost
Copy link

ghost commented Mar 8, 2017

{
  "infoHash": "d2474e86c95b19b8bcfdb92bc12c9d44667cfa36",
  "infoHashBuffer": {"/": "$infoHashAsCID"},
  "name": "Leaves of Grass by Walt Whitman.epub",
}

@daviddias
Copy link
Member Author

daviddias commented Mar 8, 2017

@lgierth I only received your comments after I posted, it seems that we had this chat while you were writing those :)

Notes from a chat with @jbenet and @whyrusleeping

  • Instead of having ipld-torrent-file and ipld-torrent-info, we just need ipld-bencode to be able to resolve through bencode encoded objects. After rethinking this, I remembered why we need ipld-torrent-file and ipld-torrent-info, we need them to enable the resolver to resolve through paths inside these objects (i.e. dag.get(infoHash/pieces/0) should return the piece and not the string that is the sha1 of the piece).
  • When importing a Torrent, two objects need to be created, one for the info and one for the Torrent file itself.
  • New command added: torrent import <torrent-file, magnetic-uri, infohash>
  • jsipfs torrent will be available through a module called ipfs-torrent that exposes both a CLI and a module (like ipfs-unixfs-engine).

This leads to the following steps

1. Implement the IPLD Formats to support torrents

2. Implement a blockstore that uses webtorrent as it's storage driver

  • torrent-pull-blob-store
  • confirm that we can dag.get(<torrentHash or infoHash>) and traverse through those objects

3. Implement the ipfs-torrent service (like ipfs-unixfs-engine)

  • module
    • .import (adds the torrent file and creates an infoHash object too)
      • import by magnetic URI
      • import by infoHash
    • .add
      • single files support
      • directories support
    • .cat (single files)
    • .get
  • cli
    • spawn a js-ipfs daemon or connect to a remoteDaemon

@jbenet
Copy link
Member

jbenet commented Mar 8, 2017

\o/

@dignifiedquire
Copy link
Member

@diasdavid maybe wait with the torrent blob store for the datastore refactor?

@daviddias
Copy link
Member Author

@dignifiedquire I see the value, but won't block Torrent support because of the datastore refactor, it is not a dependency.

@daviddias
Copy link
Member Author

To keep on log, here is the real structure of both Torrent file and info fields - https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure

@daviddias
Copy link
Member Author

Bringing this one back (🎪 )

Instead of having ipld-torrent-file and ipld-torrent-info, we just need ipld-bencode to be able to resolve through bencode encoded objects. After rethinking this, I remembered why we need ipld-torrent-file and ipld-torrent-info, we need them to enable the resolver to resolve through paths inside these objects (i.e. dag.get(infoHash/pieces/0) should return the piece and not the string that is the sha1 of the piece).

It turns out that we might actually just need to do the bencode, because the format, as described in -- https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure -- prescribes that the SHA1 hashes of the pieces be all concatenated, which means that there won't be any <infohash>/info/pieces/<insert piece number>, unless we apply a transformation to the bencoded data in the first place.

This means that we won't be able to use IPLD resolver to traverse through, without transforming the data, as that pieces field will just be a very long byte array value.

@daviddias daviddias added the status/ready Ready to be worked label Mar 9, 2017
@daviddias daviddias self-assigned this Mar 9, 2017
@ghost
Copy link

ghost commented Mar 10, 2017

It's pretty ironic, but we can exploit the fact that it prescribes SHA1 and split every 40 bytes.

@daviddias
Copy link
Member Author

20 bytes*, @lgierth we can indeed, that falls into the 'Transformations' category, as IPLD compatible format goes, we are strict about not messing with the data.

@daviddias daviddias changed the title Adding Torrent support to IPFS 🌟 adding Torrent support to IPFS Mar 21, 2017
@daviddias daviddias mentioned this issue Mar 22, 2017
22 tasks
@kumavis
Copy link
Contributor

kumavis commented Mar 29, 2017

@diasdavid I dont think splitting on 20 bytes for each piece id is any different than biting off the first N bytes for the first parameter of any binary serialization

i would say its not a transformation if the serialization doesn't need to change

our thinking seems to diverge here, based on previous discussions around ethereum resolvers

@daviddias
Copy link
Member Author

@kumavis agreed that there might be space to be a little less strict with the separation of local resolver vs transformation. Note: I intuitively did the same as you with dag-pb https://github.com/ipld/js-ipld-dag-pb/blob/master/src/resolver.js#L44-L47 .

I'll be with @nicola next week and revisit this question for IPLD transformations. Let's continue this thread on the IPLD repo ipld/ipld#13.

@kumavis
Copy link
Contributor

kumavis commented Apr 3, 2017

I think ipld/ipld#13 is slightly more complicated (pre-process with hash, split into halfbytes).

splitting the concatenated SHA1 refs still falls under (consume path part, return result) which is no more of a transformation than any IPFS resolver performs.

@jeremyBanks
Copy link

jeremyBanks commented Aug 29, 2017

I wanted to note the release of The BitTorrent Protocol Specification v2. I don't expect it to be fully supported soon, but it's probably worth being aware of them when designing v1 support. My understanding may not be entirely correct, but here are the key points as I understand them:

v2 torrents use different structures than v1 in the info dictionary and metainfo .torrent files. v2 torrents are identified using SHA-2-256 hash of the info dictionary, truncated to 20 bytes to match the length of v1's SHA-1 hashes. It's possible to create hybrid torrents that contain both v1 and v2 structures, and can be identified by either hash.

Because a different hash function is used, v1 and v2 torrents' IPFS paths be distinguished (because that's included in their multihash):

/ipfs/f 017b 11 14123456fc77d23aca05a8b58066bb55fe06c72f8e - SHA-1, v1
/ipfs/f 017b 12 14cd5877ccec0ebc8c231ecc70265ce239a90bdb9e - truncated SHA-2-256, v2

EDIT: the following is wrong, see my next comment.

BitTorrent magnet links do not have this information; v1 and v2 magnet links cannot be distinguished. I think you need to connect connect to the torrent swarm and download the metadata before you can check which version and hash algorithm were used.

So it may not be strictly correctly possible to map BitTorrent magnet URLs (e.g. ipfs/ipfs-companion#256) to a specific IPFS path, because the hash algorithm will not be known.

@sesam
Copy link

sesam commented Oct 12, 2017

ping @arvidn Maybe you know if magnet: links uniquely identify content, or if it needs network discovery, and if this is considered a feature or bug for v2?

@jeremyBanks
Copy link

jeremyBanks commented Oct 12, 2017

What I wrote above is wrong! I apologize for the misinformation. >_<

The updated BEP-9 does in fact use a multihash under a different key to identify a v2 torrent data. I thought that this was cut out before the final version. (The idea of using multihash elsewhere in the protocol was cut, I didn't realize it remained here.) So I think the direct mapping is like:

SHA-1, v1
/ipfs/f017b1114123456fc77d23aca05a8b58066bb55fe06c72f8e
magnet:?xt=urn:btih:123456fc77d23aca05a8b58066bb55fe06c72f8e

truncated SHA-2-256, v2
/ipfs/f017b1214cd5877ccec0ebc8c231ecc70265ce239a90bdb9e
magnet:?xt=urn:btmh:1214123456fc77d23aca05a8b58066bb55fe06c72f8e

Hybrid torrents still have two possible addresses, but that shouldn't be a problem.

@arvidn
Copy link

arvidn commented Oct 12, 2017

yeah, the hash in the magnet link definitely identifies the content. However, it also identifies some other metadata such as piece size, file names, etc. So even with bittorrent v1, it's possible to have two separate magnet links refer to exactly identical content (but with different piece sizes for instance).

@daviddias daviddias added the P1 High: Likely tackled by core team if no one steps up label Oct 18, 2017
@daviddias daviddias removed their assignment Jun 2, 2018
@daviddias daviddias added exp/wizard Extensive knowledge (implications, ramifications) required P2 Medium: Good to have, but can wait until someone steps up labels Jun 2, 2018
@polkovnikov
Copy link

Great feature (adding torrent support)! What's the current status (no activity on this thread for 4 years)?

Also does anyone know if there are similar torrent supporting efforts going on in go-ipfs?

@achingbrain
Copy link
Member

js-ipfs is being deprecated in favor of Helia. You can follow the migration plan here #4336 and read the migration guide.

This piece of work was never completed - there are 0x7b and 0x7c draft multicodec table entries for torrent file info and the files themselves, but someone would need to write a module that extends Helia to add support, a la @helia/unixfs and @helia/ipns.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
exp/wizard Extensive knowledge (implications, ramifications) required ipld P2 Medium: Good to have, but can wait until someone steps up status/ready Ready to be worked
Projects
No open projects
Status: Done
Development

No branches or pull requests

9 participants