Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TIER2 connections survive node restart #8318

Merged
merged 93 commits into from
Feb 3, 2023

Conversation

saketh-are
Copy link
Collaborator

@saketh-are saketh-are commented Jan 10, 2023

This PR implements stabilization for the TIER2 network:

  • Adds ConnectionStore, which maintains an LRU cache of outbound connections and persists them to the database.
  • Adds behavior in PeerManagerActor to re-establish stored outbound connections upon starting a node.
  • Adds behavior in PeerManagerActor to re-establish an active outbound connection if it is in the connection store and the connection is broken.
  • Adds ConnectionStore data to the TIER2 network debug page.

A flag remove_from_connection_store is added to Disconnect messages. The flag is set to true by the disconnecting node if it does not expect to accept reconnection from the peer, advising the peer to remove the connection from its connection store if present. If a peer sends an old-version Disconnect message without the field, remove_from_... defaults to false and the connection will be re-attempted if it is in the ConnectionStore.

TODOs:

  • Add a command line flag such as --ignore-rc to skip reconnection when starting a node

@saketh-are saketh-are changed the title TIER2 network stabilization Make TIER2 connections survive node restart Jan 18, 2023
core/store/src/columns.rs Outdated Show resolved Hide resolved
@saketh-are
Copy link
Collaborator Author

As before, ready for another look apart from renaming remove_from_recent_outbound_connections everywhere.

Copy link
Contributor

@pompon0 pompon0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very good now! Just one test needs fixing - the one with the clock rewinding (I called it clock skew in the previous comment, but skew is sth different, sorry), but other than that you can go ahead and merge. The flag can be added in a separate PR. I think that we can name the flag to sth like:
"--connect_to_reliable_peers_on_startup" with default "true".

@saketh-are saketh-are merged commit 4306fce into near:master Feb 3, 2023
nikurt pushed a commit to nikurt/nearcore that referenced this pull request Feb 6, 2023
Adds ConnectionStore, which maintains an LRU cache of outbound connections and persists them to the database.
Adds behavior in PeerManagerActor to re-establish stored outbound connections upon starting a node.
Adds behavior in PeerManagerActor to re-establish an active outbound connection if it is in the connection store and the connection is broken.
Adds ConnectionStore data to the TIER2 network debug page.
Adds a boolean flag remove_from_connection_store to Disconnect messages.
nikurt pushed a commit to nikurt/nearcore that referenced this pull request Feb 13, 2023
Adds ConnectionStore, which maintains an LRU cache of outbound connections and persists them to the database.
Adds behavior in PeerManagerActor to re-establish stored outbound connections upon starting a node.
Adds behavior in PeerManagerActor to re-establish an active outbound connection if it is in the connection store and the connection is broken.
Adds ConnectionStore data to the TIER2 network debug page.
Adds a boolean flag remove_from_connection_store to Disconnect messages.
near-bulldozer bot pushed a commit that referenced this pull request Feb 17, 2023
…rtup (#8576)

This is a follow-up to #8318, in which reliable peers are persisted to storage in the ConnectionStore.

A new boolean flag `--connect-to-reliable-peers-on-startup` is added to neard. If set to `true`, the node will attempt to reconnect to known reliable peers from storage upon starting up. The default value is `true`.

Note that setting the flag to `false` skips reconnection, but does not purge the reliable peers in storage.

Closes #8580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants