Routing in content-oriented or content-centric networks means that any content published in this network needs to have a routable address. While traditionally the number of routable addresses (i.e., IP addresses) needed was bounded by the number of physical machines/end-nodes, switching to addressable content means that the number of routable addresses expands to match the number of unique content items (i.e. any file) that is published on the network. Content-addressable networks face the challenge of routing scalability, as the amount of addressable elements in the network rises by several orders of magnitude compared to the host-addressable Internet of today.
In the case of IPFS and libp2p, content routing is done primarily by means of a Distributed Hash Table (DHT). Although DHTs are known to scale in the number of nodes, the decentralised and totally unmanaged structure of the IPFS and libp2p systems presents challenges when it comes to dialability of nodes and look-up latency in the underlying network.
The solutions proposed for this RFP will be addressing our published open problem on Routing at Scale. Please review the problem in detail for requirements and constraints.
In this RFP, we are looking for approaches that target the structure of the Distributed Hash Table and design an enhanced version of it that uses multiple layers or dimensions. Each layer of the DHT is either topologically embedded into the underlying network topology (e.g., through geolocation or hop count), is based on a specific sharding strategy, or is based on some specific topic to exploit the power of social interactions. Topological embedding is important here in order to take advantage of locality of interest and reduced number of network hops to resolve content. Approaches should be resilient to high churn, provide low look-up time, and scale to tens of millions of users.
- Objective 1: Design of multi-level DHT
- Review related literature, create a short survey and identify solutions from which it is worth borrowing concepts.
- Design overall system architecture adapted for the case of libp2p (and IPFS as a use case).
- Objective 2: Performance and scalability evaluation
- Demonstrate the performance of your design through formal methods, calculations (e.g. Python Notebook, ObservableHQ, etc.), or a simulation environment (e.g. PeerSim, ns-3) with regard to the following metrics:
- Look-up time (from when a node first seeks for a piece of content to when it finds a provider).
- Publication propagation time (time needed for published content to be discoverable by other nodes).
- Publication propagation overhead (signalling traffic generated when publishing a new content item).
- Load balancing (load asymmetry between DHT nodes at different layers).
- Performance improvement for non-uniform content popularity (i.e., for realistic content distributions where the exponent of the zipf distribution is in the area of 0.7-1.5).
- Compare with alternatives and state of the art.
- The performance evaluation should take into account main network stats:
- Significant network size variation, with weekly cycles between 150K and 250K nodes.
- Large fraction (75%+) of nodes that undiable due to NAT/Firewalls.
- A 100GB file transforms into a graph of roughly 1 million blocks when added to IPFS. For random-access to be possible, that means that roughly 2 million provides (DHT puts) need to happen every time interval (default: 24 hours).
- Demonstrate the performance of your design through formal methods, calculations (e.g. Python Notebook, ObservableHQ, etc.), or a simulation environment (e.g. PeerSim, ns-3) with regard to the following metrics:
- Objective 3: Validation in testbed environment
- Develop a prototype Go implementation that integrates into the go-libp2p Implementation.
- Evaluate performance in a testbed environment (provided by Protocol Labs). The testbed is capable of simulating/emulating realistic network conditions and with configurable network size.
Rolling: we will be reviewing applications in batches corresponding to calendar months. The call will close on 30 June 2020 or earlier if awarded.
- One full-time Post-Doctoral researcher or two PhD students. Applicants can be in one or multiple institutions if proposers want to build a collaborative project.
- Experience with: distributed systems, P2P systems, DHTs, Internet routing protocols, content-addressable networks, Information-/Content-Centric Networks (ICN/CCN).
We expect the technical depth of the work to be at the PhD level but smaller grants are also available to sponsor MSc level work.
Up to $70,000 per proposal. Possibility of up to 20% paid in FIL.
60% upon award and 40% on completion (adjustable to accommodate institutional requirements).
David Dias (@daviddias). We encourage you to reach out to rfp@protocol.ai if you’re considering applying or have any questions.
Results are to be released as open source under the Permissive License Stack (Dual License Apache-2 + MIT).