Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade libp2p to v0.53.* #4935

Merged
merged 14 commits into from
Dec 7, 2023
Merged

Conversation

divagant-martian
Copy link
Collaborator

@divagant-martian divagant-martian commented Nov 19, 2023

Issue Addressed

Upgrades libp2p to v0.53. See https://github.com/libp2p/rust-libp2p/releases/tag/libp2p-v0.53.0 and the subsequent .1 release

Proposed Changes

This adds metrics for the transports directly from libp2p with a better granularity. It also changes the way handlers are closed. We no longer have a way to force closing a handler, instead libp2p relies on checking established substreams. Rest of the changes are not too relevant

Additional Info

na

@divagant-martian
Copy link
Collaborator Author

@mxinden I'm curious about the bandwidth metrics here. I'm getting for example

libp2p_bandwidth_bytes_total{protocols="/ip4/tcp",direction="Inbound"} 2967
libp2p_bandwidth_bytes_total{protocols="/ip4/udp/quic-v1/p2p",direction="Outbound"} 0
libp2p_bandwidth_bytes_total{protocols="/ip4/tcp/p2p",direction="Outbound"} 33160
libp2p_bandwidth_bytes_total{protocols="/ip4/tcp",direction="Outbound"} 2845
libp2p_bandwidth_bytes_total{protocols="/ip4/tcp/p2p",direction="Inbound"} 1661330
libp2p_bandwidth_bytes_total{protocols="/ip4/udp/quic-v1/p2p",direction="Inbound"} 0

What's the difference between

  • libp2p_bandwidth_bytes_total{protocols="/ip4/tcp/p2p",direction="Outbound"}
  • libp2p_bandwidth_bytes_total{protocols="/ip4/tcp",direction="Outbound"} ?

@mxinden
Copy link

mxinden commented Nov 23, 2023

What's the difference between

  • libp2p_bandwidth_bytes_total{protocols="/ip4/tcp/p2p",direction="Outbound"}

Here either a NetworkBehaviour or Swarm::dial dialed a ipv4 tcp with a /p2p/SOME_PEER_ID.

  • libp2p_bandwidth_bytes_total{protocols="/ip4/tcp",direction="Outbound"} ?

Here either a NetworkBehaviour or Swarm::dial dialed a ipv4 tcp without a /p2p/SOME_PEER_ID.

Not quite ideal. We could add the peer id to the multiaddr in case it doesn't have one once the dial succeeded, and thus always have a /ip4/tcp/p2p.

https://github.com/libp2p/rust-libp2p/blob/3b6b74db7fdb252103699efcbc85cb11605ea893/misc/metrics/src/bandwidth.rs#L92-L94

Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Diva!

beacon_node/lighthouse_network/Cargo.toml Outdated Show resolved Hide resolved
// Creates the TCP transport layer
let (tcp, tcp_bandwidth) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can now incorporate Max's changes presented on #4695. If you feel like it's too much work don't worry, I can do it on a subsequent PR

Copy link
Collaborator Author

@divagant-martian divagant-martian Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this but it looks less clean imo to jam the tcp configuration in a closure when it's already well contained in a function (build_transport). In that PR Thomas notes as well this. Tbh I don't see the benefit

@divagant-martian
Copy link
Collaborator Author

divagant-martian commented Nov 27, 2023

thanks @jxs. However this is not yet properly tested (thus the draft status). This upgrade also affects the way libp2p handles closing handlers so it would be good if @AgeManning gives a look at the RPC handler changes. I went with a simple approach that I think should work but I'd like his opinion on this

@divagant-martian divagant-martian marked this pull request as ready for review November 28, 2023 15:20
@divagant-martian divagant-martian changed the title libp2p 53 upgrade libp2p to v0.53.* Nov 28, 2023
@jimmygchen jimmygchen added ready-for-review The code is ready for review Networking labels Nov 29, 2023
Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the RPC things I'm not sure this will work.

Can we no longer close connections at the handler level?

We designed lighthouse such that it was ultimately the RPC connection handler that closes connections to peers, so that we could send a goodbye message to them before we drop the connection.

Unless libp2p have changed some things, I recall that the connection_keep_alive is grouped with all other behaviours. So even if the RPC says the keep_alive is false, the connection won't close if gossipsub's keep_alive is true or identify or ping etc.

I think we might need to explicitly inform the behaviour to drop the connection if we can't do it in the handler any more.

There is a chance I'm wrong about this, will have to double check.

beacon_node/lighthouse_network/src/config.rs Outdated Show resolved Hide resolved
@jxs
Copy link
Member

jxs commented Nov 29, 2023

Can we no longer close connections at the handler level?

yeah, see libp2p/rust-libp2p#4755

@divagant-martian
Copy link
Collaborator Author

yeah I mentioned this in the description

We no longer have a way to force closing a handler, instead libp2p relies on checking established substreams.

Which is a bit annoying and somewhat reverts to the state of things a couple of years back. There is a way to ignore for keep alive, where our problematic contender would be gossipsub. I'll look into that. Otherwise we would need to bubble up a new error kind since right now reporting is done on a substream level and the couple of errors modified do not fit in this model. I'll try to check the first approach before going with modifying the handler errors

@divagant-martian
Copy link
Collaborator Author

Well, that idea died rather quickly.

ignore_for_keep_alive is available at the stream level so there is no way to do it per sub-behaviour.

@mxinden would it make sense to allow the NetworkBehaviour derive to tag sub-behaviours with custom keep-alive overrides?. In LH's case the override would be constant function false for each one except the rpc handler. Mearly a suggestion, since the new state is less flexible than previous one.

For this PR tho, I'll go with sending up the errors to the behaviour for it to close the connection. This approach is more resilient to changes in libp2p internals anyway

Comment on lines +1454 to +1455
HandlerEvent::Close(_) => {
let _ = self.swarm.disconnect_peer_id(peer_id);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the noise @mxinden

I think the new API does not fully allow the previous behavior.

When emitting an event from the handler, this is received on swarm.poll_next_unpin without the information of the connection_id, so that this line could be close_connection instead.

For LH this is good enough since we allow only one connection per peer, but it seems like a slight regression in connection management in libp2p.

Now, if there is a way to achieve this "the correct way" (closing the specific connection) and this is an oversight on my part, happy to hear and use the alternative in this pr

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing context here. What is HandlerEvent::Close? I can only find this type alias:

/// Events the handler emits to the behaviour.
pub type HandlerEvent<Id, T> = Result<RPCReceived<Id, T>, HandlerErr<Id>>;

I am assuming that this is std's Result which does not have a Close variant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you are missing is a pull max. This type is no longer a result

@mxinden
Copy link

mxinden commented Dec 1, 2023

@mxinden would it make sense to allow the NetworkBehaviour derive to tag sub-behaviours with custom keep-alive overrides?. In LH's case the override would be constant function false for each one except the rpc handler. Mearly a suggestion, since the new state is less flexible than previous one.

Can you summarize the motivation behind this change? Note that connection_keep_alive is a method on ConnectionHandler and not on NetworkBehaviour. Also note, that you can set SwarmConfig::with_idle_connection_timeout to a very high value to effectively disable any closes due to KeepAlive::False.

Comment on lines +54 to +58
pub enum HandlerEvent<Id, T: EthSpec> {
Ok(RPCReceived<Id, T>),
Err(HandlerErr<Id>),
Close(RPCError),
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mxinden are you missing a pull?

@divagant-martian
Copy link
Collaborator Author

divagant-martian commented Dec 1, 2023

@mxinden would it make sense to allow the NetworkBehaviour derive to tag sub-behaviours with custom keep-alive overrides?. In LH's case the override would be constant function false for each one except the rpc handler. Mearly a suggestion, since the new state is less flexible than previous one.

Can you summarize the motivation behind this change? Note that connection_keep_alive is a method on ConnectionHandler and not on NetworkBehaviour. Also note, that you can set SwarmConfig::with_idle_connection_timeout to a very high value to effectively disable any closes due to KeepAlive::False.

Deriving NetworkBehaviour for a struct that combines sub-behaviours also builds the associated handler. The issue here is that we don't have access to the connection_keep_alive function of the generated handler. The simplest entry path for this that I see is the NetworkBehaviour derive

Also, the issue is not to prevent closing, but to trigger it

Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a nice soln!

Just a single comment

beacon_node/lighthouse_network/src/rpc/handler.rs Outdated Show resolved Hide resolved
@AgeManning AgeManning merged commit 6c0c41c into sigp:unstable Dec 7, 2023
24 checks passed
@jimmygchen jimmygchen added the backwards-incompat Backwards-incompatible API change label Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-incompat Backwards-incompatible API change Networking ready-for-review The code is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants