Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to allow out-of-tree router implementations #199

Closed
wants to merge 5 commits into from

Conversation

vyzo
Copy link
Collaborator

@vyzo vyzo commented Sep 23, 2019

By popular demand, this refactors the pubsub system to allow out-of-tree router implementations.
This is accomplished by introducing a PubSubControl interface, which is the pubsub view presented to routers.
It is currently sufficient to implement our in-tree routers, but we may want to extend it in the future.
It also adds a new field to the protobuf, extData, to be used by out-of-tree protocols for their own purposes.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

I believe we need to export from from pubsub.RPC.


// Eval schedules a thunk for evaluation in the pubsub process loop.
// Returns false if the evaluation was aborted because the pubsub system has been shutdown.
Eval(thunk func()) bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is funky.

  • Are we allowed to use the rest of the interface outside of a "thunk"?
  • Are we using this to protect gossipsub state? If so, what if we just used a lock?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface in general is unsafe to use outside of the event loop, this should probably be specified in the documentation; will fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I wonder if we should put that part into a separate interface and pass it to the thunk? That should make it clear that that's the only safe way to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Eval wait for the thunk to eval?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It waits for the evaluation to start, but it doesn't wait for completion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked the unsafe outside the event loop methods as such.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to consider moving these to a separate interface and passing it into the thunk. I really wish go had some kind of goroutine local storage so we could just make all this automatic but it doesn't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the interface gets attached in Router.Attach and can get used without explicitly re-entering with Eval -- all callbacks to the router from the pubsub system are inside the event loop and that would be awkward to use.
I think the contract should be "you call the pubsub system inside the event loop; if you have a goroutine that is outside (eg a timer, like in gossipsub), use Eval to re-enter the event loop."
We should probably document this in the Router interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed synchronously: All the router callbacks are also called on the main thread so we'd have to feed this new interface in there as well. TL;DR: not worth it.

Instead, we should:

  1. Carefully document this.
  2. Maybe call this InvokeInMainThread (or something less java-y).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a smell. Alternative design proposal in comments above.

pb/rpc.proto Outdated
@@ -12,6 +12,7 @@ message RPC {
}

optional ControlMessage control = 3;
optional bytes extData = 4; // protocol extension data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gofmt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops! emacs did this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(use-package go-mode
  :config (gofmt-before-save))

😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any is probably a bad idea because it needs metadata, which may not be supported by some programming languages.
Also, our protobuf is proto2 so we don't want to pollute with proto3 constructs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a gofmt save hook, but this is a protobuf file!

Copy link
Member

@Stebalien Stebalien Sep 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:roll:

So, I'm just cheating and using spacemacs. (add-to-list 'protobuf-mode-hooks (lambda () (setq-local indent-tabs-mode t)))?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed the indentation; the hook looks good, i might adopt it :p

@vyzo
Copy link
Collaborator Author

vyzo commented Sep 23, 2019

Good point, the from field will have to be exported.

@vyzo vyzo requested a review from Stebalien September 23, 2019 20:50
var ErrQueueFull = errors.New("queue full")

func (p *PubSubControlView) PeersForTopic(topic string) (map[peer.ID]struct{}, bool) {
tmap, ok := p.topics[topic]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying the peer list into a slice shouldn't really cost us anything in the grand scheme of things. I'd do that to avoid exposing internal state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this method is on the hot path, hence the intention to avoid copying by introducing the thunk scheduling to guard state...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is on the hot path indeed.

Copy link
Member

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: I think the Eval() is a smell. Its goal is to curtail state leakage caused the preexisting design. An event-based model would be cleaner and more comfortable to develop against.

// PeersForTopic returns the set of peers in a topic; the boolean indicator will be false
// if the topic is unknown.
// This method is not safe to call outside the pubsub process loop.
PeersForTopic(topic string) (map[peer.ID]struct{}, bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, would a router want to limit their view of topic membership by only seeing peers that support their routing protocol? The reason I ask is because all other methods can be regarded as events, except for this one, which is pure state access. It seems to break the abstraction a bit.

I'd be more inclined to have the router and control layers be event-based, with no shared state. That could remove the need for the synthetic Eval() method -- which is only needed to protect state at an adjacent layer...

I see this model working as follows:

  1. On startup, routers enrol with the control layer, providing a predicate that, given a []protocol.ID, returns whether the router can interact with that peer or not.
  2. On every new peer that advertises the umbrella pubsub protocol, we call the predicate on all routers, and emit a NewPeer event to routers that signal support.
  3. The control layer tracks which routers are handling which peers. When a peer goes away, we emit an event relevant routers.
  4. For topic membership control messages (join or leave), the control layer emit events to the supporting routers.
  5. Routers send messages by pushing events on a channel that the control layer is looping over.
  6. ...

This approach would insulate the state of each layer and we'd remove the need for cross-boundary scope changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a major redesign i'd rather not tackle in this pr!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, programming this with events makes it a lot harder to reason about; I really like the simplicity of a lock-free event loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I get that, but pubsub/gossipsub is relied on by major downstream adopters, and I'm happy to invest time and effort in that refactor. I think it would make the code more decoupled, elegant, accessible for contributors, and less spaghetti.

Isn't the lock-free event loop inherently mono-threaded, anyway? It can become a major bottleneck.

If instead we let routers keep their own state locally, and be responsible for mutating in reaction to events they receive from the control layer, I think it makes the system more functional and a lot easier to reason about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 that having an Eval() function on the interface seems ridiculous (also, this being ridiculous was the reason given for #171 not being it's own router). If there's a short term need for users that don't want to fork go-libp2p-pubsub just to get their work done I'd rather be even grosser and just export the fields in PubSub so people know how unstable/temporary the state is.

A more approachable, and less intrusive, approach might be to invert a small amount of the design. For example PeersForTopic could easily be part of the PubSubRouter interface instead utilizing a PubSubControl interface. If certain structs/functions are really the same for various different routers then we could have some PubSubRouterBase struct that could be embedded in the individual routers.


// Eval schedules a thunk for evaluation in the pubsub process loop.
// Returns false if the evaluation was aborted because the pubsub system has been shutdown.
Eval(thunk func()) bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a smell. Alternative design proposal in comments above.

var ErrQueueFull = errors.New("queue full")

func (p *PubSubControlView) PeersForTopic(topic string) (map[peer.ID]struct{}, bool) {
tmap, ok := p.topics[topic]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this method is on the hot path, hence the intention to avoid copying by introducing the thunk scheduling to guard state...

@vyzo
Copy link
Collaborator Author

vyzo commented Mar 23, 2020

closing as we decided to refactor internals before allowing out of tree routers.

@vyzo vyzo closed this Mar 23, 2020
@Stebalien Stebalien deleted the feat/external-routers branch August 6, 2024 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants