Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

network/service: Let get_value return a future returning a DhtEvent #3292

Closed
wants to merge 1 commit into from

Conversation

mxinden
Copy link
Contributor

@mxinden mxinden commented Aug 2, 2019

So far to retrieve a value from the DHT, one would need to:

  1. Trigger a DHT event query via NetworkWorker.get_value.

  2. Somehow retrieve the event from NetworkWorker.poll, e.g. by having it
    return the responses as a stream.

Instead of the above, this commit suggests having NetworkService.get_value
return a future, eventually resolving to the DHT response. Internally this is
coordinated via futures::sync::oneshot, maintaining a list of subscribers for
a given DHT key, notifying each one when a new event comes in.

Who needs this?

This would reduce logic needed in #3247, keeping network related code inside the
network module.

Future vs Stream?

When two users of the DHT interface trigger a query concurrently for the same
key and later on a response comes in, there is no way to tell which query this
response is responding to. get_value eventually returning a single value might
trick a user into thinking that this is the response of their query.

Instead one could return a stream instead of a future. The downside of this
approach is added implementation complexity.

For #3247 a future is sufficient. My suggestion is to properly document the
case, and stick with returning a future for now. If there is ever the need for a
consumer to consume a stream instead of a future #3247 can be adjusted
accordingly.

@tomaka @montekki let me know if you think the idea of this draft is worth further investigating.

So far to retrieve a value from the DHT, one would need to:

1. Trigger a DHT event query via `NetworkWorker.get_value`.

2. Somehow retrieve the event from `NetworkWorker.poll`, e.g. by having
it return the responses as a stream.

Instead of the above, this commit suggests having
`NetworkService.get_value` return a future, eventually resolving to the
DHT response. Internally this is coordinated via
`futures::sync::oneshot`, maintaining a list of subscribers for a given
DHT key, notifying each one when a new event comes in.
@mxinden mxinden added A0-please_review Pull request needs code review. M4-core labels Aug 2, 2019
@mxinden mxinden requested review from montekki and tomaka August 2, 2019 09:33
@tomaka
Copy link
Contributor

tomaka commented Aug 2, 2019

I really dislike the fact that polling the Future returned by get_event doesn't do anything by itself, and that we're relying on the fact that the NetworkWorker has to get polled independently. This makes the NetworkWorker less like a state machine and more like a background thread.

We used to have this kind of design in libp2p, and not doing so is what made libp2p robust.

@mxinden
Copy link
Contributor Author

mxinden commented Aug 2, 2019

we're relying on the fact that the NetworkWorker has to get polled independently.

Today any logic interfering with NetworkWorker via NetworkService, e.g. add_reserved_peer, is relying on the fact that NetworkWorker.poll is being polled, right?


We used to have this kind of design in libp2p, and not doing so is what made libp2p robust.

Can you reference an example where architectures like this caused issues?

One thing to add is that NetworkWorker.poll and NetworkService.get_value are not interdependent, given that oneshot does not do backpressure on the single value.


@tomaka what would be an alternative interface for something trying to retrieve DHT events without access to NetworkWorker.poll?


I understand that this adds more complexity and possible failure scenarios to core/network, but at the same time I see this reducing the complexity on the caller side by the same factor.

@montekki
Copy link
Contributor

montekki commented Aug 2, 2019

tbh I still have doubts that we need a Future here at all. And the fact that a DHT event is sent into two places: to notify_dht_event_subscribers and to user_protocol seems unclean.

@mxinden
Copy link
Contributor Author

mxinden commented Aug 2, 2019

And the fact that a DHT event is sent into two places: to notify_dht_event_subscribers and to user_protocol seems unclean.

Yes, that is for sure something we would need to clean up.

tbh I still have doubts that we need a Future here at all.

@montekki do you have an alternative approach for something trying to retrieve DHT events without access to NetworkWorker.poll?

@tomaka
Copy link
Contributor

tomaka commented Aug 2, 2019

Today any logic interfering with NetworkWorker via NetworkService, e.g. add_reserved_peer, is relying on the fact that NetworkWorker.poll is being polled, right?

Indeed, which is why I'm trying to remove the NetworkService entirely.

Can you reference an example where architectures like this caused issues?

This architecture is generally "fine" as long as you know that a future will not resolve unless another future is being polled. In practice, we often had to do things like future_that_we_want.select(future_that_resolves_future_that_we_want).
But when piling up layers on top of layers, it also happened that we lost track of the fact that you had to poll future_that_resolves_future_that_we_want, and that led to subtle deadlocks.

@tomaka what would be an alternative interface for something trying to retrieve DHT events without access to NetworkWorker.poll?

The point is that you should access NetworkWorker.

@mxinden
Copy link
Contributor Author

mxinden commented Aug 2, 2019

@tomaka what would be an alternative interface for something trying to retrieve DHT events without access to NetworkWorker.poll?

The point is that you should access NetworkWorker.

How would "accessing" look like? Let's say a component x in crate y in core/y/src/x.rs needs to put and get dht values. We don't want x to own a reference to NetworkWorker, right, otherwise it is the same solution that we currently have with NetworkService? Would one create a channel between x and NetworkWorker? When NetworkWorker retrieves a DHT event from the network below, does it return that event to all components that share a channel with it, or just for specific keys?

Indeed, which is why I'm trying to remove the NetworkService entirely.

I am sorry for all the questions @tomaka. In case there is a design doc describing the architecture of the refactoring I would appreciate a pointer.


Motivator for the discussions: I expect the logic I am adding in #3247 to grow in the future, enabling substrate to be smarter around which nodes to connect to, based on knowledge we can retrieve from the layers above. It is currently integrated into build_network_future, but I would guess it will outgrow the single function. Encapsulating it into its own crate would also make it easier to disable it for e.g. proof-of-work chains.

@tomaka
Copy link
Contributor

tomaka commented Aug 2, 2019

How would "accessing" look like? Let's say a component x in crate y in core/y/src/x.rs needs to put and get dht values. We don't want x to own a reference to NetworkWorker, right, otherwise it is the same solution that we currently have with NetworkService? Would one create a channel between x and NetworkWorker? When NetworkWorker retrieves a DHT event from the network below, does it return that event to all components that share a channel with it, or just for specific keys?

The HashMap that this PR adds would basically reside in the Service.

I am sorry for all the questions @tomaka. In case there is a design doc describing the architecture of the refactoring I would appreciate a pointer.

Like a lot of things that that we do in Substrate, unfortunately no.

enabling substrate to be smarter around which nodes to connect to

That's decided by the peerset crate. The job of build_network_future is only to inform of the list of validators that exist, not to make that decision.

@mxinden
Copy link
Contributor Author

mxinden commented Aug 2, 2019

The HashMap that this PR adds would basically reside in the Service.

And then inside Service we have a subscribe function?

fn subscribe(self, hash: Mutlihash) -> impl Future<Item=DhtEvent, Error=_>

How does that differ from the approach in this pull request apart from pushing the logic one level up into the Service instead of core/network and requiring two calls to retrieve a dht value, one to trigger the query, one to subscribe to the response?

@mxinden
Copy link
Contributor Author

mxinden commented Aug 7, 2019

Keeping NetworkWorker a simple state machine, with a single way of making progress (poll) sounds like the right way to go. The idea would be to push the logic to subscribe to DHT events into substrate service, dispatching the events received from the poll stream to subscribing components.

I am closing here in favor of the approach described above.

@mxinden mxinden closed this Aug 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants