Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design blueprint for signed peer records #761

Open
11 tasks
raulk opened this issue Dec 11, 2019 · 7 comments
Open
11 tasks

Design blueprint for signed peer records #761

raulk opened this issue Dec 11, 2019 · 7 comments
Assignees

Comments

@raulk
Copy link
Member

raulk commented Dec 11, 2019

This issue outlines the design blueprint to implement Signed Peer Records, to serve a guidance for the implementation efforts underway.

Record crafting and dissemination:

  • Emit LocalAddressesUpdated event on the eventbus when we detect changes in our local address set.
    • Refer to how identify push does this -- unfortunately it polls and compares with a timer; plus points if we can make this push-based.
  • Build a "signed peer record crafter" component that subscribes to this event type. When receiving an event, it fetches host.Addrs(), rebuilds the record, and emits a new event LocalPeerRecordUpdated, with a reference (pointer) or inline record.
    • The emitter uses the Stateful() option, which delivers the last seen message to any new subscribers.
  • Placement. Where do we place this component? I'm thinking inside the IDService.
    • Pros: logically it fits there, if we regard the ID service to be that, a system-wide service that manages self identity and peer identification.
    • Cons: the IDService() accessor is not available on host.Host, it's a construct of BasicHost.
      • I think this is fine, because no other component will need to explicitly access it. They'll receive the record via the bus. Decoupled dependency \o/
  • Adapt identify push and identify delta to push the new record out as soon as its available.
    • I don't think we need new protocol versions; just enhancing the protobuf with the new fields is enough.
    • Protobuf is designed to ignore unrecognised fields.

Peer record tracking:

  • Peerstore: does it need to track and store signed peer records? Various use cases:
    • forwarding records for other peers:
      • we need to store the raw serialised record, so we can forward it with no recomputation involved.
      • storing the processed data and signature separately and assuming that we'll arrive to the same payload when re-serialising the record is a shot in the dark.
    • a protocol/user querying addresses for a peer -- whether the provenance is signed routing record or not should be transparent.
    • dialling a peer
  • New peerstore interface CertifiedAddrBook with a method to ingest signed peer records with a TTL, and a method to fetch the raw bytes. On ingest, it overwrites all addresses with the incoming ones.
  • Inserting standalone addresses (AddAddrs) for peers that support certified records should fail.

Integration with the DHT:

  • Wire-level change only. New protocol dht/1.1 indicates that peer supports signed peer records.
  • Local options: Strict or Lenient.
    • If Strict, local peer only registers dht/1.1 protocol and therefore only interfaces with peers supporting signed peer records.
    • If Lenient, local peer registers both dht/1.0 and dht/1.1 protocols, the latter taking precedence over the former in protocol negotiation.
  • On dht/1.1, when responding to routing queries, we use the CertifiedAddrBook interface. (we must abort initialisation if the injected Peerstore does not implement that interface)
  • Maintain one routing table per dht protocol, where dht/1.0 includes dht/1.1 members, but not viceversa. This solution assumes that dht/1.1 peers also support dht/1.0. That will be true for a transition period. In the future, we'll deprecate dht/1.0.
@yusefnapora
Copy link
Contributor

Sweet, let me just echo back a couple points to make sure I get the implications.

Record generation & dissemination

  • Building a signed record from the host's addresses should happen dynamically, in response to an event indicating the addresses have changed.
  • We should emit the new signed records on the event bus.
  • This should also trigger an identify/push with the new signed record.
    • Maybe an identify/delta also? Right now delta only deals with protocol changes. Should it also include address changes?

Peerstore

  • A new CertifiedAddrBook interface is better than a breaking change to AddrBook, and consumers should type assert on the Peerstore to see if it implements CertifiedAddrBook.
  • We don't need the CertifiedAddrs method on CertifiedAddrBook, since the provenance of an addr should be transparent.
  • We should either have certified addresses for a peer or non-certified, but never both.

DHT

  • Use a new protocol version (dht/1.1) for signed records, with an optional strict mode to only register the new protocol.
  • dht/1.1 will only send and accept signed records, and will require that the peerstore supports the CertifiedAddrBook interface.
  • Each protocol version gets its own routing table.

About this point:

Maintain one routing table per dht protocol, where dht/1.0 includes dht/1.1 members, but not viceversa. This solution assumes that dht/1.1 peers also support dht/1.0. That will be true for a transition period. In the future, we'll deprecate dht/1.0.

Is it necessary to make any assumptions here? It seems like dht/1.1 peers that support dht/1.0 will advertise both protocols in identify. We could just pull all peers that support 1.0 in the "old dht" routing table and peers that support 1.1 in the "new dht" table. Peers that support both will end up in both tables (assuming there's space in their bucket).

@raulk
Copy link
Member Author

raulk commented Dec 11, 2019

Maybe an identify/delta also? Right now delta only deals with protocol changes. Should it also include address changes?

Ideally yes, but I don't want to overcomplicate your work. If it's simple, go for it.

Use a new protocol version (dht/1.1) for signed records, with an optional strict mode to only register the new protocol.

We should register both protocols by default (lenient mode is the default). If strict mode is chosen, we only register dht/1.1. In the future, once most of the network has upgraded, we flip the default to strict.

Is it necessary to make any assumptions here? It seems like dht/1.1 peers that support dht/1.0 will advertise both protocols in identify. We could just pull all peers that support 1.0 in the "old dht" routing table and peers that support 1.1 in the "new dht" table. Peers that support both will end up in both tables (assuming there's space in their bucket).

Yes, that would be ideal. The caveat is that current master is racy when it comes to identify:

  1. subscribes to network notifications.
  2. upon a Connected notifications, it checks to see if identify has finished and protocols have been populated in the peerstore. <= if we fall here, we'll know exactly which protos the peer supports.
  3. if blank, we fall back to opening a stream. <= if we fall here, we won't know exactly which protos the peer supports.

The stabilize branch actually waits for peer identification consuming events from the eventbus: https://github.com/libp2p/go-libp2p-kad-dht/blob/stabilize/subscriber_notifee.go#L41

PR here: libp2p/go-libp2p-kad-dht#365

I would be ecstatic if you cherry-pick those changes into master (as well as the corresponding ones in the stabilize branch of go-libp2p), so we can bring this feature in and rely on it:

#660
#659

@yusefnapora
Copy link
Contributor

It looks like the reason that BasicHost is polling to detect address changes is that there are several async processes that could potentially alter our address set:

  • the NATManager discovers a new NAT mapping for one of our listen addresses
  • we receive new observed addresses via identify
  • our AutoRelay status changes, potentially adding / removing relay addresses
  • maybe others?

The AutoRelay and observed address changes we could issue events for, but it looks like the NatManager and go-libp2p-nat expect to be polled. E.g. from the docs for nat.NewMapping:

May not succeed, and mappings may change over time; NAT devices may not respect our port requests, and even lie. Clients should not store the mapped results, but rather always poll our object for the latest mappings.

So some kind of polling seems necessary to track changing NAT mappings that are outside of our control. We could potentially move the polling into the NATManager and have it send out events when it detects changes, but I'm not sure that's better than what we have now. I guess it would remove the polling in cases where you don't have a NATManager configured.

@yusefnapora
Copy link
Contributor

I've been working on making the CertifiedAddrBook a standalone interface, and it turns out that you can't actually do the type assertion on a Peerstore implementation to see if it supports CertifiedAddrBook. Which is to say, this always fails, even if the AddrBook implementation does have the required methods:

cab, ok := someHost.Peerstore().(peerstore.CertifiedAddrBook)
// ok is always false

After some poking around, I realized that this is because the peerstore struct in go-libp2p-peerstore doesn't pull in the CertifiedAddrBook interface. It's defined like this:

type peerstore struct {
	pstore.Metrics

	pstore.KeyBook
	pstore.AddrBook
	pstore.ProtoBook
	pstore.PeerMetadata
}

Even if the AddrBook implementation does technically implement everything required by CertifiedAddrBook, the compiler won't pick up on that, since it only hoists the methods from AddrBook into the peerstore type, not every method on the AddrBook implementation.

You can make it work by adding the CertifiedAddrBook interface to the struct, but then you have to do the type assertion in NewPeerstore to initialize it:

func NewPeerstore(kb pstore.KeyBook, ab pstore.AddrBook, pb pstore.ProtoBook, md pstore.PeerMetadata) pstore.Peerstore {
	cab := ab.(pstore.CertifiedAddrBook)
	return &peerstore{
		KeyBook:      kb,
		AddrBook:     ab,
                CertifiedAddrBook: cab,
		ProtoBook:    pb,
		PeerMetadata: md,
		Metrics:      NewMetrics(),
	}
}

This will fail if ab isn't actually a CertifiedAddrBook, of course, so we're now effectively requiring all AddrBook implementations to also be CertifiedAddrBooks. To me this means that it's not really a separate interface after all, except in the superficial sense of having a different name.

I think we might as well just make the breaking API change and add the methods to AddrBook after all as @vyzo was saying in the -core PR. What do you think @raulk?

@yusefnapora
Copy link
Contributor

Thinking some more about the DHT changes. Is it a problem that the DHT can be configured with arbitrary protocol ID strings? I dug around a bit and found at least one project (0x-mesh) that's using a custom protocol ID for their DHT, which it seems like we encourage by letting people pass in their own ID strings in a config option.

If we're making signed record support conditional on a specific protocol ID string, that will only work if people are using our default protocol IDs. Instead of checking against a fixed string, we could potentially add another option, e.g. SignedAddressCapableProtocols(ids ...protocol.ID) so that, if you are using custom protocol IDs, you could tell us which of them support signed addresses. Or possibly we could define a Capabilities type and let you map protocol IDs to Capabilities, with signed addrs being one of the capabilities.

That seems a bit awkward, but it would let us enable capabilities for a particular protocol ID without having to hard-code the exact ID string, and it would let downstream users customize their protocol IDs and still opt-in to the new capabilities.

The other alternative that I can think of is to not make the behavior conditional on the protocol ID, and instead advertise signed record support in the DHT protobuf message. I've been thinking about how that could work and it isn't too terrible, but there are some edge cases to consider.

For now I'm going to stick with defining a new protocol ID and ignore the custom protocol problem, but I'll try not to box myself in too much.

@yusefnapora
Copy link
Contributor

@raulk I've been thinking some more about the proposal to have separate routing tables, either completely split between dht/1.0 and dht/1.1 peers, or with one table having dht/1.0 and dht/1.1 and the other having just dht/1.1 peers.

Either way seems like it will lead to "lenient mode" peers having an artificially restricted view of the network. With separate routing tables, two lenient peers interacting with each other will never share contact info for any "old-school" v1.0 peers, even though both of them are capable of interacting with the older peers.

This could easily break the lookup path and lead to surprising results. For example, let's say that I'm a lenient peer and the content I want is stored only on an old-school dht/1.0 peer. I look at both of my routing tables (since I have no way of knowing which kind of peer will be closest to the content), and it turns out that all of the closest peers that I know about are dht/1.1 peers that happen to be in lenient mode. They all know about the peer that's actually got the goods, but only in their dht/1.0 routing table. When I use dht/1.1 to ask them, they all come up empty and say they don't know any closer peers. Whereas if an older dht/1.0 peer had sent the same query to those same intermediate peers, they would return the target peer, since the intermediate peers would be pulling from the dht/1.0 routing table.

Even if lenient peers query the closest peers in both routing tables, it's still problematic. Some of the peers in the dht/1.0 table might also be lenient peers, or they might be old-school peers that would unknowingly direct me to lenient peers. If I then try to query them, the dht/1.1 protocol will always "win" the protocol negotiation, so they'll respond to me with information from a completely different routing table than the one that originally led me to them in the first place. This seems like it's bound to lead to weird lookup failures that are hard to diagnose.

Maybe this could work as an alternative:

  • We keep a single routing table
    • Strict mode peers will only add dht/1.1 capable peers to their table
    • Lenient peers add both dht/1.0 and dht/1.1 peers to their table
  • We add a flag to the RPC message for dht/1.1 to indicate whether we're in strict mode
  • When a lenient peer is speaking to a dht/1.1 peer, the peers it sends back depend on whether the other dht/1.1 peer is in strict mode:
    • if we are talking to a strict peer, we only send back dht/1.1 peers
    • if we're talking to a lenient peer, we send back all peers (dht/1.0 and dht/1.1)
  • When a lenient peer is speaking to a dht/1.0 peer, it will send only peers that support dht/1.0, which could be older peers that only speak dht/1.0 or other lenient mode peers.

I think this results in:

  • lenient peers can see and route between all other peers
  • old-school dht/1.0-only peers can see and route between other old-school peers and lenient peers
  • strict mode peers can only see and route between dht/1.1 peers (both strict and lenient). They never learn about old-school peers, because lenient peers will filter them out of their responses.

Does that make sense?

@yusefnapora
Copy link
Contributor

yusefnapora commented Dec 18, 2019

I realized a lot of my worries above might be unfounded. I was thinking that lenient peers would always do a protocol negotiation dance where they'd first try dht/1.1 and then fallback to dht/1.0, which would always result in dht/1.1 winning if the other peer supports both. Instead, we could explicitly open a dht/1.0 stream for peers that are in the v1.0 routing table.

I think that would allow separate routing tables to work without the weird lookup problems I was worried about. That said, I'd still prefer to have a single table if we can make it work, just from an implementation standpoint. There are a lot of places in the code where we assume a single table, and we'd be adding a decent amount of overhead by having to periodically random-walk over both.

Edit after some digging: it looks like dialing a specific protocol will take some non-trivial modifications to the messageSender thing that we're using to pool and reuse streams. Whenever it opens a new stream, it always tries all the protocols we support. The stream pool in general makes this difficult if we ever want to use different protocols for the same (lenient) peer, since we'd need to keep track of which protocol was actually used for the pooled stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants