Skip to content
This repository has been archived by the owner on Sep 6, 2022. It is now read-only.

peerstore: model address labels #123

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

raulk
Copy link
Member

@raulk raulk commented Mar 2, 2020

This PR introduces an AddressProvenance data type that acts like an unsealed enum. This means that users can add user-defined values, and have them tracked by the peerstore.

Address provenances are single-byte values. The range [0x00, 0x80) is reserved for libp2p, and we ship with five values (from lowest to highest precedence): unknown, third party, untrusted, trusted, manual.

The range [0x80, 0xff] is at the disposal of users.


The proposal to incorporate this feature in the peerstore APIs consists of adding: ...peerstore.ReadOption and ...peerstore.WriteOption arguments to peerstore methods, to record the provenance when inserting addresses, and filter by provenances when querying and filtering addresses.

That interface change, coupled with the unknown default value, would allow users to stay non-breaking. Implementations of the AddrBook interface would break; but AFAIK, we are the only implementers, so the shockwave is greatly absorbed.

@raulk raulk requested review from Stebalien and yusefnapora March 2, 2020 19:17
@Stebalien
Copy link
Member

Stebalien commented Mar 3, 2020

I'm not sure why we need a registry for this. Can't we just store strings? If it becomes an issue, we can easily create a compression table by just storing a list of used provenance in the peerstore.
edit: is the registry just for convenience? I.e., it's not a global registry?
edit edit: it totally is. I thought we were going to try to maintain a global registry. This makes a lot more sense

I'm also not sure how this solves the problem. Could you explain the motivation for provenance? Do applications need to be able to add additional information to addresses? If so, I'm not sure if we need to address that now. The pressing issue now is that we need to decide which addresses to dial and/or keep.

From where I stand, I believe the main thing we need to be able to do is rank addresses for both dialing and garbage collection. I'd classify addresses into two categories, certified and uncertified, and three uncertified classes, user/application/gossip.

  • Certified: Signed peer routing records.

  • Uncertified: Addresses that have not been certified by the peer.

    • User: Explicitly specified by the user.
    • Application: Explicitly specified by the application. We could consider having multiple levels of application?
    • Gossip: Random addresses we get from the network, etc. We could also have multiple levels?
  • When dialing, we'd try user, then application + certified, then maybe a few gossiped addresses.

  • When exchanging peer information, we should exchange at least the certified addresses. When exchanging uncertified addresses, other nodes should treat them as gossiped.

  • Applications may choose to exchange uncertified addresses over other protocols. When they do this, they can choose to record them as "application".

@raulk
Copy link
Member Author

raulk commented Mar 3, 2020

@Stebalien I think I could've explained some aspects better in the description, but I had to rush through. We are very much aligned. The registry is effectively a compression table, and it uses Golang type acrobatics to avoid runtime costs in string representation. Note that enum values need to remain constant across restarts, to preserve semantics when using the datastore-backed peerstore.

The categorisation you propose is very similar to what's in the out-of-the-box table, with a few caveats:

  • In my view, certification is one way of acquiring trust. I went for modelling "trusted"/"untrusted" instead, but happy to reconsider.
  • Gossip == third party.

When exchanging peer information, we should exchange at least the certified addresses. When exchanging uncertified addresses, other nodes should treat them as gossiped.

This layer (peerstore) is mostly concerned with attaching those labels to addresses, and their subsequent queriability, and also with storing the certified record. Each subsystem is free to decide how and which addresses it exchanges; it can filter by querying the peerstore appropriately.

Applications may choose to exchange uncertified addresses over other protocols. When they do this, they can choose to record them as "application".

The problem with a generic "application" tag is that you may have multiple "applications" behind the same libp2p node. I believe a single value is short-sighted.


After sleeping on it, I think there's a larger opportunity to sculpt this solution into a more general "peer/address labels" mechanism.

  • Subsystems and apps can attach arbitrary labels to peers or addresses, and query the peerstore for them via ...peerstore.Options.

  • That model would fit in well with a non-ref-counted Pin/Unpin peer addresses #117, where subsystems attach labels to peers, and specify "policies" for labels, e.g. label dht-routing-table => "do not collect".

@Stebalien
Copy link
Member

Stebalien commented Mar 3, 2020

The registry is effectively a compression table, and it uses Golang type acrobatics to avoid runtime costs in string representation

My suggestion is that we just create it automatically on first use, storing the table to disk iff the peerstore is persisted. Otherwise, sub-components need to coordinate.

After sleeping on it, I think there's a larger opportunity to sculpt this solution into a more general "peer/address labels" mechanism.

What about multiple address store sources?

@Stebalien
Copy link
Member

The problem with a generic "application" tag is that you may have multiple "applications" behind the same libp2p node. I believe a single value is short-sighted.

Fair enough. My concern is that if the tags don't imply ordering/preference, it'll be hard to choose which ones to dial first. I want to make sure we've thought this all the way through to the end goal.

@raulk raulk changed the title WIP peerstore: model address provenance data type. peerstore: model address labels Mar 4, 2020
@raulk
Copy link
Member Author

raulk commented Mar 4, 2020

My suggestion is that we just create it automatically on first use, storing the table to disk iff the peerstore is persisted. Otherwise, sub-components need to coordinate.

Without lifecycle methods (peerstore.Start()), we don't have a good way of binding/registering labels before the peerstore starts and reconciles its table with disk. We would need to inject them at peerstore construction time, which is the Host construction time, which means that we'd need to inject user-defined labels as a libp2p Host option -- that's too far up.

In other words, there's a few things to fix in libp2p before we can make this nicer.

I'd say, let's shoot for the API we want, caveating against known pitfalls. When we fix the init/construction logic of libp2p, there'll be only one thing to fix, instead of two.

What about multiple address store sources?

Not sure I follow. You mean addresses that have more than one source? You'd attach two/three/as many labels as you wish to them.

Fair enough. My concern is that if the tags don't imply ordering/preference, it'll be hard to choose which ones to dial first. I want to make sure we've thought this all the way through to the end goal.

At this point I don’t want to use this as an implicit/hardcoded precedence metric. With the modular dialer, users should easily be able to configure precedence. With dialer v1, anything we do to prioritise dials is gonna be spaghetti and adhoc. With the modular dialer, you’d be able to specify the order of dials when instantiating the pipeline.

@raulk
Copy link
Member Author

raulk commented Mar 4, 2020

@Stebalien I've generified this to address (and in the future, peer) labels.

@yusefnapora
Copy link
Contributor

@raulk This looks good to me so far; I might remove the comments about labels applying to either addresses or peers though, at least until we have an API to set and read labels for peers.

As it's worded now, it's not clear whether ps.AddAddrs(p, addrs, Labels(someLabel)) would add the labels to the peer or the addrs.

@Stebalien
Copy link
Member

Without lifecycle methods (peerstore.Start()), we don't have a good way of binding/registering labels before the peerstore starts and reconciles its table with disk. We would need to inject them at peerstore construction time, which is the Host construction time, which means that we'd need to inject user-defined labels as a libp2p Host option -- that's too far up.

We intern them:

  1. If we have a persistent peerstore, we restore the label set from disk on load.
  2. When the user calls AddAddrs(..., Label("mylabel")), we:
  3. Check to see if the label is a known label.
  4. If not, add it to the label set, assign the next ID, and persist the label set.
  5. Use the label's ID.

Basically, lazily register labels and automatically assign codes as we need them.

@Stebalien
Copy link
Member

Discussed out of band. @raulk proposed that, to solve the "who do I dial issue" without dialer v2, we can pass dialer constraints along with the context.

The only concerns I have here are:

  • Annoyance. We can ensure we only use certified records in the DHT easily but ensuring this in other services will become annoying.
  • Multiple parallel dials. We already have the issue where new addresses aren't added to existing dials. However, this means that parallel dials where one is more "tolerant" than the other may behave oddly.
  • I'd like to be able to set a default policy (e.g.: user + certified, falling back on everything else if we don't have either).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants