Add `sudo_network_unstable_watch` #91

tomaka · 2023-09-12T13:21:33Z

This is a revival of paritytech/smoldot#2245

This new function lets the JSON-RPC client subscribe to the networking state of the server, which allows visualizing what is happening, which in turn should be useful for debugging and teaching purposes.

One opinionated decision I took is that the JSON-RPC server might omit some events if the events are generated too quickly. For example, if a peer-to-peer connection opens then closes repeatedly 1000 times per second, we don't have to send 2000 events per second to the JSON-RPC client and instead we're allowed to not send anything.
This means that the JSON-RPC client might not be aware of everything that's happening. There's an event to notify it when that happens so that a warning or something can be shown to the user.

An alternative would be to report everything and generate a stop event if the queue is too large, but that's not much better, because the JSON-RPC client wouldn't be able to know what happens between the moment when the stop happens and the moment when it resubscribes. In other words, this wouldn't solve the problem and is more complicated.

There's no possible way (CAP theorem) to guarantee that the JSON-RPC client will receive events about everything that happens, unless you slow down the networking of the node to the speed of the JSON-RPC client. This isn't something that we want to happen, normally. Maybe we could imagine a privileged mode where that is the case, but I haven't included this in this PR at the moment because it seems overkill.

lexnv · 2023-09-12T14:46:33Z

src/api/sudo_network_unstable_watch.md

+    "event": "missedEvents"
+}
+```
+


Would it also be beneficial to have an optional even reported for cases of miss-behaving / reputation banning?

The networking in general has a shit-ton of concepts. Beyond reputations, you could also think of reporting ping times, handshake and request sizes, k-buckets states, grandpa neighbor packets, block announces, etc.

This PR focuses on just open connections and substreams because 1) these concepts are easy to understand 2) we know for sure that Polkadot will never not have connections and substreams.

src/api/sudo_network_unstable_watch.md

Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>

Co-authored-by: Josep M Sobrepere <jm.sobrepere@gmail.com>

lexnv · 2023-09-21T08:46:54Z

src/api/sudo_network_unstable_watch.md

+{
+    "event": "substreamState",
+    "connectionId": "...",
+    "substreamId": "...",


Dumb question to make sure I got this right: we want to report all open-streams (ie /ipfs/ping, /ipfs/indentify, /[hash]/kad) together with notification protocols, not just notification-protocols (/[hash]/tx/1 etc)?

For notification-protocols, we could easily generate a substreamID when we establish a connection just before the handshake is done. As for getting all substreams, I believe we'd need to interface substrate somehow with yamux implementation directly, and we'd most probably end-up either:

rewriting libp2p-yamux wrapper over the yamux crate

crate a shim wrapper over libp2p-yamux, but patch latest libp2p version to expose the inner yamux stream which contains the stream ID into: pub Stream( pub yamux::Stream)

And since we could change the muxer in the future, for example changing the transport to quic, wouldn't the substreamID feel like an implementation detail? (Since we'd need to find a workaround to expose the substreamID from the transport level in that case) 🤔

Let me know if I got this right, or if we could achieve the same behavior easier 🙏

// cc @tomaka @altonen @jsdw

I think that you just need a wrapper (a "middleware") of the StreamMuxer trait, no need to touch Yamux itself. Every type of connection implements StreamMuxer and is only ever accessed through this trait by the rest of the stack.

However what I didn't realize is that getting the protocol name is kind of tricky, because it is done by sending messages within the substream.
Substreams themselves don't inherently have protocol names, it's just that in libp2p every single substream starts with a handshake that negotiates its protocol.

Substreams themselves don't inherently have protocol names, it's just that in libp2p every single substream starts with a handshake that negotiates its protocol.

Which also means that this RFC needs to clarify what to do with substreams whose protocol hasn't been negotiated yet.

How are these meant to be implemented for libp2p protocols in the first place? Substrate may have no knowledge that a substream for Kademlia or PING has been opened so there would have to some tracking implemented in libp2p that would expose this information to Substrate.

I don't see the problem with protocol names though. Once the protocol has been negotiated, the name is known and can be reported. If there was a separate "opening" state, then it would be more challenging.

How are these meant to be implemented for libp2p protocols in the first place? Substrate may have no knowledge that a substream for Kademlia or PING has been opened so there would have to some tracking implemented in libp2p that would expose this information to Substrate.

Substrate entirely wraps around all the libp2p protocols: https://github.com/paritytech/polkadot-sdk/blob/a56fd32e0affbe9cb61abb5ee3ec0a0bd15f5bf0/substrate/client/network/src/behaviour.rs#L45-L47

I think that you can do it by:

Adding a wrapper around the ConnectionHandler trait that intercepts on_connection_event.

Adding a wrapper around the NetworkBehaviour trait so that you can wrap around the ConnectionHandler

In your NetworkBehaviour wrapper, you can modify the ToSwarm type to include events about substreams being opened/closed, and then intercept these events in the main network events handler.

I haven't touched these traits in years now, so I might be missing something, but I think it should work.

I think that you can do it
Adding a wrapper around the ConnectionHandler trait that intercepts on_connection_event.
Adding a wrapper around the NetworkBehaviour trait so that you can wrap around the ConnectionHandler

From looking at the code, my findings also point in this direction. Thanks!

I started to implement a BehaviorWrapper a few days ago, which looked like this:

pub struct BehaviorWrapper<B: BlockT> { behavior: Behavior<B>, } impl NetworkBehavior for ... { }

This was the first step to expose the ConnectionID since the SwarmEvents from our libp2p version does not expose it at the moment.

I gave up on going that path because of two reasons:

feat(swarm): expose ConnectionId and add conn duration metric libp2p/rust-libp2p#3927 which adds it in the p2p crate, and Upgrade libp2p to 0.52.4 polkadot-sdk#1631 which updates the p2p crate including the connection ID

the amount of methods I had to re-export from the underlying Behavior and the overhead of reimplementing the NetworkBehavior felt like bad code for the sole benefit of the ConnectionID

However, I cannot think of a better way than what Pierre suggested:

pub struct BehaviorWrapper<B: BlockT> { behavior: Behavior<B>, } /// Must propagate the connection ID for proper identification of /// connection IDs + streamIDs pub struct HandlerWrapper { connectionID: ..., .. } impl NetworkBehavior for BehaviorWrapper { type ConnectionHandler = HandlerWrapper; fn handle_established_inbound_connection(connectionId, ..) { HandlerWrapper::new(connectionId, is_inbound=true, ...) } fn handle_established_outbound_connection(connectionId, ..) { HandlerWrapper::new(connectionId, is_inbound=false, ...) } ... } impl ConnectionHandler for HandlerWrapper { // Might have missed something here // types = tbd / maybe other wrappers here fn on_connection_event(..) { // Generate a stream ID stream_id = AtomicIndex::increment(); // Propagate up to the swarm MetricsEventWrapper::CustomProtocolOpen { connectionId, streamId, protocolName, direction, .. } }

This feels indeed a bit involved and I'd resume implementation once the substrate-p2p refactor is complete:

Rework the event system of sc-network polkadot-sdk#1370

Prevent opening another substream right after the connection is closed polkadot-sdk#1563

Upgrade libp2p to 0.52.4 polkadot-sdk#1631

Since I expect most of this work will generate conflicts with those PRs

The prefix of the function in this PR is sudo_network, so it's only the sudo_network that aren't implemented.

Ah ok, I see that now; the other groups all have a singe word like chainHead_ or chainSpec_ or archive_, but we have sudo_, sudo_sessionKeys_ and now sudo_network_ as three more groups, which I was confused by since it's a bit inconsistent in my mind.

it's a bit inconsistent in my mind.

Yeah it is, because ultimately the sudo functions should be renamed.

@lexnv @jsdw

If the PoC/prototype for this code turns out to add non-trivial complexity to sc-network, I hope you reconsider adding this feature.

The libp2p upgrades are really painful to deal with and I would hope to reduce the surface area of libp2p in sc-network to leave us more time to work on things other than dealing with libp2p and its upgrades. That looks to be at odds with the proposed wrapper implementation and my concern is that if this wrapper adds more code that has to be maintained during each libp2p upgrade, it just leaves us with less time to work on actually improving the Polkadot networking code.

I didn't answer that, but I didn't intend this to be implemented in Substrate quickly.

The reason for this function comes from a very pragmatic problem: most of the time, when smoldot isn't working, it's because of a connectivity issue.

When you're running a Substrate node of some kind, you're either a developer or a node operator. You can read logs, run Prometheus, open GitHub issues, etc.

When you're running smoldot, you're an end user, and you shouldn't be reading logs. The only reasonable way to help end users figure out why smoldot isn't working is put some kind of UI on top of it. This JSON-RPC function is what would power this UI.

tomaka · 2023-11-16T11:16:49Z

As a heads up, I'm going to implement this in smoldot (merged or not). It's one of the milestones that I've put in the smoldot financing treasury proposal.

jsdw

Happy to see this merge; it's unstable anyway, and probably mostly useful in Smoldot and perhaps less so in Substrate if I understand right!

josepot · 2024-01-23T11:09:42Z

I have reviewed this PR and I think that it's ready to be merged. Also, I can't wait to leverage this from polkadot-api and it seems that smoldot has already made a lot of progress.

As @tomaka already explained: this API is mostly meant for light-clients and it lives in its own group of functions. Meaning that a simple query to rpc_methods would let the consumer know whether this API is supported. Perhaps, it could make sense to have a separate/complementary API for full-nodes/substrate?

tomaka · 2024-01-25T21:28:11Z

I've encountered an issue while implementing this in smoldot due to a corner case.

Let's say that you start an instance of smoldot and add the Polkadot and Kusama chains, and add the same IP address and PeerId as bootnode at the same time to both Polkadot and Kusama.

When I wrote this RFC, smoldot would open networking connections for Polkadot and networking connections for Kusama. That's what is reflected in this RFC. In that example scenario, smoldot would open two connections to the same IP address and PeerId, and thus would report through the JSON-RPC function only events that concern the connection of the chain corresponding to that JSON-RPC subscription.

However, smoldot now supports sharing connections. In that example scenario, smoldot would only open one connection and use the same connection for both Polkadot and Kusama.
So let's imagine that smoldot connects to that IP address, reports a JSON-RPC event about the connection being open to the active subscriptions, but then realizes that the node is only capable of serving Polkadot and not Kusama.
Previously, smoldot would close the Kusama connection and keep the Polkadot one, but now smoldot would keep the connection alive anyway and simply ignore that node when it comes to Kusama.

This leads to a tricky question: should smoldot, in that case, generate an event to the Kusama JSON-RPC subscription that the connection has been closed, even though it hasn't actually been closed? Doing so would be the logical thing to do, but would be misleading. The objective of this JSON-RPC function is to show to the user what smoldot is doing, and reporting events that don't match the reality (even when logically coherent) goes against this objective.

An alternative would be to report every single connection to every subscription, no matter if this connection is going to be used for the chain the subscription corresponds to. This is also misleading and spammy.

I don't really know right now how to solve that problem.

josepot · 2024-01-25T21:58:44Z

This leads to a tricky question: should smoldot, in that case, generate an event to the Kusama JSON-RPC subscription that the connection has been closed, even though it hasn't actually been closed?

I'm inclined to think that smoldot should generate an event for the Kusama JSON-RPC subscription indicating the connection is closed, focusing on the "logical connection".

From the API consumer's perspective, the connection is effectively "closed" once Smoldot ceases communication with the Kusama node, regardless of the physical connection status. This approach would deem the persistence of the physical connection as an implementation detail.

Given the API is primary intended to be implemented by light-clients, it would be beneficial to update the specification to clearly differentiate between "logical" and "physical" connections, emphasizing that the API's notifications pertain to the "logical connection".

Add sudo_network_unstable_watch

53e674d

tomaka mentioned this pull request Sep 12, 2023

Add unstable JSON-RPC function that exports recent networking events smol-dot/smoldot#72

Open

Hyperlink in SUMMARY

194e38a

lexnv reviewed Sep 12, 2023

View reviewed changes

lexnv previously approved these changes Sep 12, 2023

View reviewed changes

lexnv reviewed Sep 15, 2023

View reviewed changes

src/api/sudo_network_unstable_watch.md Outdated Show resolved Hide resolved

josepot reviewed Sep 15, 2023

View reviewed changes

src/api/sudo_network_unstable_watch.md Outdated Show resolved Hide resolved

tomaka dismissed lexnv’s stale review via 690e9ff September 15, 2023 13:29

tomaka and others added 2 commits September 15, 2023 13:29

Update src/api/sudo_network_unstable_watch.md

690e9ff

Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>

Update src/api/sudo_network_unstable_watch.md

d844c19

Co-authored-by: Josep M Sobrepere <jm.sobrepere@gmail.com>

lexnv approved these changes Sep 15, 2023

View reviewed changes

lexnv reviewed Sep 21, 2023

View reviewed changes

tomaka mentioned this pull request Oct 11, 2023

Periodically print debug info about chains with 0 peers smol-dot/smoldot#99

Open

tomaka mentioned this pull request Nov 16, 2023

Implement sudo_network_unstable_watch smol-dot/smoldot#1343

Open

josepot approved these changes Jan 21, 2024

View reviewed changes

tomaka mentioned this pull request Jan 22, 2024

Update sudo_network_unstable_watch to latest JSON-RPC draft smol-dot/smoldot#1597

Merged

jsdw approved these changes Jan 23, 2024

View reviewed changes

josepot merged commit 607b71b into paritytech:main Jan 23, 2024
3 checks passed

tomaka deleted the sudo_network_unstable_watch branch January 23, 2024 11:38

josepot mentioned this pull request Jan 26, 2024

@polkadot-api/client health-info API polkadot-api/polkadot-api#222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `sudo_network_unstable_watch` #91

Add `sudo_network_unstable_watch` #91

tomaka commented Sep 12, 2023 •

edited

Loading

lexnv Sep 12, 2023

tomaka Sep 13, 2023 •

edited

Loading

lexnv Sep 21, 2023

tomaka Sep 21, 2023

tomaka Sep 21, 2023

altonen Sep 21, 2023

tomaka Sep 21, 2023 •

edited

Loading

lexnv Sep 21, 2023

jsdw Sep 21, 2023 •

edited

Loading

tomaka Sep 21, 2023

altonen Sep 22, 2023 •

edited

Loading

tomaka Jan 23, 2024

tomaka commented Nov 16, 2023

jsdw left a comment

josepot commented Jan 23, 2024

tomaka commented Jan 25, 2024 •

edited

Loading

josepot commented Jan 25, 2024 •

edited

Loading

Add sudo_network_unstable_watch #91

Add sudo_network_unstable_watch #91

Conversation

tomaka commented Sep 12, 2023 • edited Loading

Choose a reason for hiding this comment

tomaka Sep 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaka Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsdw Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

altonen Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomaka commented Nov 16, 2023

jsdw left a comment

Choose a reason for hiding this comment

josepot commented Jan 23, 2024

tomaka commented Jan 25, 2024 • edited Loading

josepot commented Jan 25, 2024 • edited Loading

Add `sudo_network_unstable_watch` #91

Add `sudo_network_unstable_watch` #91

tomaka commented Sep 12, 2023 •

edited

Loading

tomaka Sep 13, 2023 •

edited

Loading

tomaka Sep 21, 2023 •

edited

Loading

jsdw Sep 21, 2023 •

edited

Loading

altonen Sep 22, 2023 •

edited

Loading

tomaka commented Jan 25, 2024 •

edited

Loading

josepot commented Jan 25, 2024 •

edited

Loading