Skip to content

How it works

gr3yc4t edited this page Jun 18, 2018 · 4 revisions
  • Parse the .torrent file
  • Extract peers from trackers
  • Peer wire protocol

Metainfo file parsing

First, the .torrent file, called metainfo file, must be parsed to extract information. All the information in the metainfo file is bencoded (bencoding is a way to organize data in a compact format).

We used a library to extract all the information in a tree-like data structure, and then we used the struct Torrent to store information in a better way.

typedef struct {
  std::vector<std::string> trackers;
  std::string name;
  int piece_length;
  std::string pieces;
  size_t num_pieces;
  std::vector<TorrentFile> files;
  boost::dynamic_bitset<> bitfield;
  bool is_single = false; // true if single file torrent
} Torrent;

Before making requests to trackers, the client calculate the bitfield. Bitfield is a bitmap that represents the pieces already downloaded (bit set to 1) and missing pieces (bit set to 0). This is used during the communication with peers.

In the metainfo file there are the SHA1 hash values of all the pieces. Once the client download a piece completely, it compares the SHA1 hash value with the one of the metainfo file, and if they are equal the corrisponding bit in the bitfield is set to 1.

Peer wire protocol

For each peer in the peer list extracted from trackers, a new thread is spawn and begin the peer wire protocol.

The first part of the communication with a peer is the handshake.

If the handshake is successful, then usually peers send their own bitfield (it is not mandatory).

All of the remaining messages after this initial phase have this form:

<length prefix><message ID><payload>

where length prefix is a a four byte big-endian value, message ID is a single decimal byte, and payload depends on the type of message.

A message can be one of the following:

  • keep alive: a client could drop the connection if it doesn't receive any message from the peer after a timeout. Keep alive is used to mantain the connection alive

  • choke: the connection is choked. This means that the sender doesn't accept any request from the peer, but it may unchoke the connection in the future. A client usually sends this message when it has lot of established connection and wants to prevent congestion.

  • unchoke: the connection is unchoked. The receiver can send requests to the sender.

  • interested: the sender is interested in downloading some pieces from the receiver. This may influence the choke/unchoke status

  • not interested: the sender is not interested in downloading any pieces from the receiver.

  • have: the payload of this message is the index of a piece that has just been downloaded successfully. This is used to update the bitfield of the peer. Some peers also use the have message after sending a bitfield with missing pieces (even if it has all data), they claim it can help against ISP filtering of BitTorrent protocol.

  • bitfield: the payload is the bitfield of the peer. The high bit in the first byte corresponds to piece index 0. Spare bits at the end are set to zero.

  • request: this message is used to asks for a block of data. The payload takes the form of <index><begin><length>. index corresponds to piece index, begin corresponds to the byte offset within the piece, and length is the length of the requested block of data. A client could drop the connection if length is too large, so in our client we put a limit of 2^14 (16KB) block length.

  • piece: this message contains a block of data. It is the response to a request message. The payload takes the form of <index><begin><block>, where index is the piece index, begin is the byte offset within the piece, and block is the block of data. block has not a fixed length, it is possible to calculate it by using <length prefix> of the message with the following formula:

block_length = <length prefix> - 9
  • cancel: this message is used to cancel block requests. It has the same payload of a request message. This can be used when a client asks for the same block of data to multiple peers, and when it receives the first response it sends a cancel message to the other peers.

  • port: this is used by some clients to update a local routing table, in the case DHT trackers are supported. This client does not support this feature, but it can read a port message only to keep the socket buffer clean.

Clone this wiki locally