-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: a radically simpler Peerstore #2355
Comments
The Peerstore(or AddrStore or whatever name this component ends up with) is still very useful for application usage, so please keep it. Furthermore, the datastore-backed implementation is helpful for networks wishing to bootstrap from non-bootstrapper peers, so please keep it as well. Such applications would store peers in the component provided by go-libp2p instead of reinventing the wheel. If libp2p does not provide any means to persist addresses and relies only on the centralized set of bootstrappers, is it really p2p? These two become even more helpful together with application-defined metrics. Currently, only GossipSub does extensive scoring in the libp2p world, but GossipSub's peer quality metric is also meaningful for other protocols. Usually, if GossipSub thinks a peer is behaving poorly, the chance it behaves poorly in other protocols is high. Potentially, multiple protocols would talk to some sort of libp2p provide PeerScorer/Tracker to maintain a subjective image of the peer. |
Thanks for your feedback @Wondertan.
Unless the application is only interested in storing data for the lifetime of the underlying connection, it's not really useful, is it? And even that use case can trivially be replicated with a map and ~10 lines of code for handling the event bus notifications. For most applications, a more sophisticated cleanup logic will be required. Fundamentally, this is a question about the correct layering. Why would libp2p expose a general interface for peer-related information? Just because we define the
libp2p doesn't rely on bootstrappers, Kubo does (did). With the v0.21 release, they're persisting peers and not relying on the bootstrappers for subsequent starts, without using the database-backed peerstore. It really doesn't make a lot of sense to store information that you regularly need in a database, just so you can persist that information on shutdown. That data should live in memory, and be persisted on disk on shutdown (or in regular intervals). The new peerstore could expose a method to export a snapshot of its current state
That's not possible with today's peerstore either. I agree that this could potentially be useful, but will require some careful abstractions. Not every implementation will / should be familiar with the intricacies of GossipSub's scoring function. I doubt that the right approach is to extend the existing |
Thanks for writing this up. I agree with all points. I'm thinking about how to ease the transition for users who are relying on this. Maybe we could expose a new package that lets user protocols still use the existing Peerstore interface ( |
I agree with all the points. How should we handle removing PeerMetadata? |
@Wondertan what application specific usages do you have in mind other than I find the current Peerstore API very confusing. One reason that applications cannot use it correctly is that for all TTLs except permanentTTL, we'll remove peers 30 mins after disconnect. This leads to hacks like this https://github.com/libp2p/go-libp2p-kad-dht/blob/master/dht.go#L549 |
Discussed this with @Stebalien. It's probably useful to have access to the protocols supported on a connection. It makes sense to store this information with the connection, not in the peerstore. That would allow us to (explicitly) support a different set of protocols on relayed connections. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This is the main use case which was driving us towards using a db-backed peerstore. Without something writing to disk we need to bootstrap every time a node restarts. Db-backed would also allow us to store arbitrary data for a peer in the metadata--e.g. if we wanted to serialize gossipsub score for persistence across reboots as well. Is the future approach intended to move all persistence into each individual application, or just focus it in the discovery mechanism? |
Why's that? There's plenty of hooks to learn about connecting / disconnecting peers. It would be easy (and way more performant!) to hook into the eventbus and periodically save some of the connected peers to disk.
Indeed it is. Storage considerations will vary greatly between applications, and it's not realistic to expect that libp2p will be able to provide a one-size-fits-all datastore. |
I think at the moment it's because it works out of the box. We also haven't gone through perf testing yet to see what kinds of io patterns the disk-backed peerstore creates. Ideally in a network with low churn you wouldn't be updating the peerstore that often and so wouldn't have that much io. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Related issue: #2754 We used to be saving signed peer records in the certified address book, but removed that code because it didn't filter the addresses properly, and the certified address book was not used anywhere else in go-libp2p. The issue of course, is that another protocol was relying on identify's (undocumented?) behavior here. Having the existing peerstore be a core component of the host leads to these tightly coupled behaviors. I'm not sure identify needs to be so tightly coupled with the peer store. Another solution would be that Identify has a contract of events it emits and events it subscribes to. During the course of an identify it emits events about what it has learned and other components/services/protocols can subscribe to those events. Users could have a custom PeerStore better suited for their application that subscribes to those events. Or user protocols can consume those events directly. |
The protocol listens to be stored somewhere even if it's not stored in a peer store. E.g.
- Services, when first starting up, often want to find all peers that speak a specific protocol.
- Multistream definitely needs to know a peer's protocols so it can select the correct protocol without taking a round trip to negotiate anything.
That's why I suggested associating protocols with connections.
|
To jot down a couple of thoughts I've had in the meantime:
For 2, here's an API I'd like to propose: // PeerDataStore is a store for peer-specific data. Data will be cleared some
// time after a peer disconnects.
//
// This is intended for protocols to store peer related data that they want
// cleared after disconnect. It is not intended for different protocols to share
// state. Protocols wanting to do so should send messages on the event bus.
// "Do not communicate by sharing memory; instead, share memory by communicating."
type PeerDataStore interface {
// PutMetadata stores a key-value pair for a peer. The key must be unique
// for the peer. The key should follow the same rules as keys for
// `context.WithValue`. The value will be stored until ttlAfterDisconnect
// has passed since the peer was disconnected.
//
// If you only need to store things while the peer is connected, pass 0 for
// ttlAfterDisconnect. It will be cleared when the peer disconnects.
PutMetadata(p peer.ID, key any, val any, ttlAfterDisconnect time.Duration) error
// GetMetadata returns the value for a key for a peer.
GetMetadata(p peer.ID, key any) (any, error)
} There are just two methods. Protocols can do what they want with it, and we will ensure that the data gets GC'd appropriately. This is the interface that protocols (users) interact with. The actual implementation itself has more methods obviously. Generally protocols won't share their data with other protocols, but for something like the |
I think we should start by listing what we want to store. For example, storing supported protocols in this manner is going to be very trick to do without race conditions. |
I think my phrasing in my last post was not great. "Protocol" is a very overloaded term, but here I mean it as some piece of code that consumes a PeerDataStore interface, and optionally returns some other interface other users can use. Component is probably a better term here.
There's no race condition because there's a single owner of the data. "Generally protocols won't share their data with other protocols, but ... [they could] expose a more specific interface". To answer the example, there could be a component that owns tracking supported protocols. It's the only reader/writer of the underlying PeerDataStore. Concurrent requests to this component can be serialized via a lock (or equivalent). A component like this could also provide a more specialized interface like |
Agree with @sukunrt on this. It's hard to rely on the Peerstore for anything, because of the aggressive GC and TTLs. Trying to keep data in sync between the Peerstore and whatever peer-related data is tracked in the application is also often quite confusing though. I guess one reason for this is because it's unclear what guarantees Peerstore offers, and what is the right way to use it. Implementing this proposal would help, at least in making it clear that applications should not rely on the Peerstore, and should keep track of the data they might need themselves. Keeping the interface slim would help applications to implement wrappers around the built-in implementation, just in case they need to. E.g. I could think of a scenario where the application tracks "friend" peers and their latest addresses in some long-term storage, so they might want to wrap the default address-book, and read addresses from the persistent storage when there're no addresses in memory. |
Observation: The current
Peerstore
does a lot of things, but it doesn’t any of them well.Problems:
CertifiedAddressBook
).SupportsProtocol
doesn’t distinguish between Identify having completed and not, leading to complexity in applications (example).For example, this shows up:
This is a proposal to radically shrink the peerstore, such that it offers only what is needed to dial connections. Applications running on top of QUIC, and even transports like QUIC that want to use 0-RTT, will need to come up with their own data structures to store peer-related data. This is inevitable, since different applications will have different needs regarding persisting data when a peer disconnects, and different heuristics for limiting the size of these data structures.
In particular, we’ll get rid of the following components:
PeerMetadata
: Only used to save the agent / protocol version sent in Identify. There’s no real reason to save that information in the first place.Metrics
: Only used to store RTT statistics. Arguably, that’s not a per-peer, but a per-connection (or rather, per path, if dealing with multipath transports) property. We should consider exposing the RTT on theConnection
interface.ProtoBook
: Used to store the protocols supported by this peer (as learned via Identify). Takes up huge amounts of storage, for little benefit. Protocols that need a list of peers supporting their protocol can easily subscribe to the respectiveEventBus
notification. 1The
KeyBook
will be radically simplified to only allow storing of the public key.As a result, this leaves us with a peer store that only stores:
It basically becomes an address book, and maybe we should honor that fact by renaming it accordingly.
The following changes will make it a lot more smarter and more useful:
We could consider removing the
Peerstore
interface. This is (mostly) a libp2p-internal component. In particular, it does not make sense to implement a datastore-backed version of it (it might however be interesting to be able to serialize the current state, so it can be restored easily).Footnotes
Note that this breaks an optimization for
Host.NewStream
, which currently picks the first supported protocol from a list of protocols. This is fine, since 1. worst-case, this results in an extra round-trip, 2. it is a rare use-case and 3. the application can track which protocols a peer speaks and then open a stream with just that protocol, speeding up things by 1 RTT compared to the status quo. ↩The text was updated successfully, but these errors were encountered: