Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist routing table #254

Open
anacrolix opened this issue Feb 8, 2019 · 21 comments
Open

Persist routing table #254

anacrolix opened this issue Feb 8, 2019 · 21 comments
Labels
P3 Low: Not priority right now status/in-progress In progress

Comments

@anacrolix
Copy link
Contributor

Standard DHT implementations don't require bootstrapping on every instance start-up. It should be possible to import previous instance routing tables, and avoid having to bootstrap from a set of bootstrap nodes. Note that regularly refreshing the routing table is still common (and may be called bootstrapping).

@anacrolix anacrolix self-assigned this Feb 8, 2019
@raulk
Copy link
Member

raulk commented Feb 8, 2019

Thoughts.

Persistence medium

  • Option A: persist in the peerstore. With the current API, we'd have to attach metadata to a peer indicating if it belongs in the routing table. This is OK for write, but on startup it is costly to read right now. We'd have to add a method PeersWithMetadata(), get a chan of peers, and query the metadata field for each peer. Also, from the viewpoint of kad-dht, it doesn't know if the runtime peerstore is memory-backed or ds-backed. It doesn't make sense to persist the routing table to a mem-backed peerstore, so we'll need to add another method for the peerstore to return its storage medium.
  • Option B: use go-datastore directly, with the ability to pick the data model that suits us.

Seeding strategy

We need to decide how we're going to seed the routing table on start, because it'll affect what metadata we persist about peers in the routing table, and hence the data model.

Brainstorming options:

  • Take a random sample of peers that we known to be in the routing table during the previous run, as long as they were active in a certain time threshold (i.e. we don't want to reuse peers if the last run was 1 year ago), falling back to well-known bootstrap peers if none or to few are found fitting the criteria.
  • Score peers by latency, helpfulness, and bucket (i.e. peers in closer k-buckets are more valuable), and seed the routing table from the top-scoring peers. This mitigates some attack vectors, but adds others.

@Stebalien
Copy link
Member

I think we should (a) keep the address information in the peer store and (b) persist a single blob containing the routing table (peer IDs only). Really, we can probably just persist the entire routing table and reconnect to the entire routing table on start, no need to score. Thoughts?

@raulk
Copy link
Member

raulk commented Feb 8, 2019

go-ethereum seeds with a randomised sample instead of rematerializing the table verbatim, as a way to recover from borked tables. I think it’s a sensible strategy, it’s more secure, and it works.

@Stebalien
Copy link
Member

Randomized sample from the routing table? Connecting to a subset shouldn't be any more secure than connecting to all, unless I'm missing something.

@raulk
Copy link
Member

raulk commented Feb 8, 2019

In a scenario where your routing table is borked (e.g. all nodes keep their TCP connections alive but they ignore queries) and you restart, if you materialise the same table you will be stuck again. That’s why it’s more secure to seed a subset, and perform a random walk from those, to fill empty slots with fresh nodes.

@Stebalien
Copy link
Member

Won't that subset just ignore queries?

@raulk
Copy link
Member

raulk commented Feb 9, 2019

@Stebalien you’re right. I don’t know, it feels wrong not to introduce a randomisation component over restarts, maybe mixing in well-known/reputable nodes as well during seeding (for now, our bootstrappers; later, nodes we’ve calculated a high reputation for over time). Randomisation is a good starting point to defend against some attacks in a Byzantine scenario.

@raulk
Copy link
Member

raulk commented Feb 9, 2019

Basically with a mix of randomised past nodes + a few bootstrappers (to keep it simple) we should be able to recover from a borked table after a restart, albeit slowly, but without manual intervention.

@Stebalien
Copy link
Member

Mixing in some bootstrappers sounds like a good idea.

@da2x
Copy link

da2x commented Feb 22, 2019

From looking at BitTorrent client implementations; the only criterion for storing a node for future use is this: you’ve successfully established an outgoing connection to it (it’s reachable.)

On the next startup, a number of nodes are selected at random (e.g. 10). If any of them are unreachable, replace it with another randomly selected node. If you run out of nodes, fall back to the bootstrapping nodes.

If you want to be smart about it then you can try selecting some nodes that have a short prefix length from your own public IP and some nodes from a diverse set of subnets. However, I do believe blindly choosing nodes at random is a perfectly valid strategy. Nodes are expected to leave and join the network all the time so it’s probably just a waste of time to try to carefully select the perfect set of nodes.

@Mikaela
Copy link

Mikaela commented Feb 22, 2019

As Yggdrasil and Cjdns user it would be nice if I automatically got peers from 0200::/7 and fc00::/8 so not all my traffic would go through clearnet that way.

If I was always given ten random clearnet peers instead of ever using bootstrap peers, would I get peers from those networks at all?

@yangm97
Copy link

yangm97 commented Feb 22, 2019

@Mikaela afaik, atm, one way to do that is if the application talks to the cjdns/yggdrasil API directly, to fetch nodes.

But assuming you're not peering over clearnet and nodes around you are cjdns/ygg native, you could discover them by ipv6 loopback multicast, and then switch to cjdns/ygg for data exchange, once you discover the ips these nodes are advertising (and save them because ygg/cjdns ips don't change when nodes move around).

@anacrolix
Copy link
Contributor Author

From looking at BitTorrent client implementations; the only criterion for storing a node for future use is this: you’ve successfully established an outgoing connection to it (it’s reachable.)

Right on. You just store your routing table wholesale.

On the next startup, a number of nodes are selected at random (e.g. 10). If any of them are unreachable, replace it with another randomly selected node. If you run out of nodes, fall back to the bootstrapping nodes.

Again, essentially standard routing table treatment. Your point about falling back to bootstrap nodes is the missing link in this implementation. Anytime the routing table becomes unviable (or a query stalls due to that), a callback that returns bootstrap nodes can be invoked to get things rolling again, like StartingNodes here.

If you want to be smart about it then you can try selecting some nodes that have a short prefix length from your own public IP and some nodes from a diverse set of subnets. However, I do believe blindly choosing nodes at random is a perfectly valid strategy. Nodes are expected to leave and join the network all the time so it’s probably just a waste of time to try to carefully select the perfect set of nodes.

I agree. It's sufficient to just use your routing table management policy at all times, both before and after persisting. By definition the routing table is the best available peers, appropriately distributed throughout the keyspace at all times anyway.

@raulk
Copy link
Member

raulk commented Feb 28, 2019

Let's try not to hardcode any behaviour, and let's make these strategies modular. That's the ethos of libp2p.

Unlike a specific BitTorrent client, etc. libp2p is a library for multitude of use cases, and no one size fits all.

I'm thinking of two interfaces you can inject via Options:

type Loader interface {
         // Load receives a snapshots of the members of the previous routing table
         // (possibly 0-length), and the peerstore (so it can verify whether we
         // still hold addresses for past members). 
         // 
         // It selects the candidates to add to the routing table, and it may use
         // any heuristic, such as past observed latencies, failure rate, 
         // helpfulness score, etc. (recorded as metadata in the peerstore), 
         Load(previous []peer.ID, pstore pstore.Peerstore) (candidates []peer.ID)
}
type Seeder interface {
        // Seed receives the routing table to populate, the candidates the Loader 
        // picked, and the fallback peers (static bootstrap peers).
        //
        // Seed must ensure connectivity to any peers it adds, and in that process
        // it can use metrics such as current, real latency to filter peers. 
        //
        // In the event that candidates are unviable, it can fall back to the 
        // static bootstrap peers to salvage the situation.
        Seed(host libp2p.Host, table kb.RoutingTable, candidates, fallback []peer.ID)
}

We should have default strategies for when the user doesn't pass these options in the constructor, like the ones that have been cited.

We can also define common strategies in exported vars, so that the user can combine them as if they were recipes.

@anacrolix
Copy link
Contributor Author

After creating a node, it should be possible to seed the routing table with entries. It's already possible to access the routing table of a node, we only need to determine appropriate values to be reconstituted with a peer's entry, and expose that. That would include the peer ID, and its known addresses if they're not handled externally. Adding nodes to the routing table directly via a method on the DHT should only be required if the routing table does not have sufficient context to receive the above type.

Correspondingly, the DHT user should be able to retrieve a snapshot of the routing table before it is closed, or on closing, to persist as they see fit. Again, the only question here is what data is required from the DHT to persist a routing table entry and restore it later. Deferring address management and persistent to the peerstore, or whichever component is responsible for addresses makes things a little less convenient, but possibly more flexible.

@anacrolix
Copy link
Contributor Author

I think #283 should be done first, as it may affect what information needs to be persisted.

@anacrolix
Copy link
Contributor Author

#283 also makes reconstituting a routing table more robust. Currently if we did this, we'd have to connect to each peer added to maintain the routing table invariant that all entries are connected. Which also highlights how flawed that assumption is: If we can't connect to any peers while adding to the routing table due to intermittent connectivity issues, we can't restore the routing table.

@anacrolix
Copy link
Contributor Author

I'm suggesting, more or less:

IpfsDHT.SetRoutingTable([]PeerInfo)
IpfsDHT.GetRoutingTable() []PeerInfo

All other behaviours at the whim of the consumer.

@hsanjuan
Copy link
Contributor

Note that the DHT may not be the only service that would benefit from remembering peers and reconnecting to them. I kind of see that as a separate, non-dht issue that should be handled by the libp2p host directly.

  1. Peerstore should be persisted (host option)
  2. Hosts should have an option specifying what to do with known peers in the peerstore on boot (ranging between connecting to every peer, or connecting to N peers, or none). Probably as part of a host.Bootstrap() method that can be called when all services have been added since they depend on connect notifications. This allows the DHT to bootstrap itself from scratch if necessary, but also other services.
  3. Then the DHT may choose to persist its own routing table, and connect to those peers as well, but regardless of what the general host bootstrap process is.

tl;dr: Don't make peerstore persistence a dht-only feature.

@anacrolix anacrolix removed their assignment Aug 5, 2019
This was referenced Aug 11, 2019
@Stebalien
Copy link
Member

Juan brought up that simply persisting the routing table and reviving from that could be dangerous. If an attacker can ever "take over" a peer's routing table and control their view of the network, recovering from this situation would be impossible without using the centralized bootstrappers.

The proposed solution is to look at historical data. That is, we should bootstrap off of peers that we've seen repeatedly over a long period of time.

A possible solution (probably not optimal) is to keep bucketed counts of how frequently we've seen and used each peer over time:

  • < 1 hour ago
  • < 1 day ago
  • < 1 week ago
  • < 1 month ago.
  • ...

We'd prefer peers that:

  • Are continuously online (are are present in all recent buckets).
  • Are frequently useful (i.e., have high counts).
  • Have an historical presence (present in older buckets as well as new buckets).

This solution is probably quite a bit more complicated than it needs to be, but I hope it illustrates what I think we need.

@master255
Copy link

master255 commented Oct 30, 2023

Temporary solution:

var peersAddrs []peer.AddrInfo
	var i int8
	for _, peerId := range m.host.Network().Peerstore().PeersWithAddrs() {
		if m.host.ID() == peerId {
			continue
		}
		peersAddrs = append(peersAddrs, m.host.Network().Peerstore().PeerInfo(peerId))
		i++
		if i > 20 {
			break
		}
	}
	bytes, _ := json.Marshal(peersAddrs)
....

var peersAddrs []peer.AddrInfo
		err := json.Unmarshal(bytes, &peersAddrs)
....

And I found that Network().Peerstore().PeersWithAddrs() contains more information about peers than kademlia.RoutingTable().ListPeers(). So it is better to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Low: Not priority right now status/in-progress In progress
Projects
None yet
Development

No branches or pull requests

8 participants