Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality for advanced control over peer set #6097

Open
whyrusleeping opened this issue Mar 18, 2019 · 38 comments
Open

Add functionality for advanced control over peer set #6097

whyrusleeping opened this issue Mar 18, 2019 · 38 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/connection-manager Issues related to Swarm.ConnMgr (connection manager)

Comments

@whyrusleeping
Copy link
Member

I'd really like to have the ability to specify peer IDs in my config file that my node will try to always remain connected to. Additionally, it would be nice to specify strategies for these peers, like "always", which attempts to reconnect on any disconnect (respecting backoff rules), "preferred" which never closed a connection to that peer, but doesnt necessarily try and hold open a connection, and maybe one more that just increases the likelihood the conn manager will hold open the connection, but still allows it to be closed as needed.

This will enable me to keep connections open to friends machines, making transfers between us much more reliable.

@whyrusleeping whyrusleeping added kind/enhancement A net-new feature or improvement to an existing feature topic/connection-manager Issues related to Swarm.ConnMgr (connection manager) labels Mar 18, 2019
@whyrusleeping
Copy link
Member Author

Additionally, the infra team wants this to hold connections between the gateways and our pinning servers open.

@raulk
Copy link
Member

raulk commented Mar 18, 2019

This will percolate to the connection manager (current, interim and future).

@brianmcmichael
Copy link

+1

@brianmcmichael
Copy link

This feature would mean that I can set up a gateway node that could 'permanently' connect to pin storage node and speed up propagation when the machine holding the content is known.

@obo20
Copy link

obo20 commented Mar 27, 2019

We would also highly benefit from this type of feature. We're currently running an automated "swarm connect {gatewayAddr}" with our nodes every 5 minutes or so to keep these connections open.

Having an official "supported" way of keeping nodes connected would be amazing.

@whyrusleeping
Copy link
Member Author

@Stebalien @raulk A pretty quick and non-invasive way of doing this would be to add a list of peers to the connection manager that lets it handle the ‘preferred’ case (not closing the connection). Then, in go-ipfs we can have a little process that is fed a list of ‘always’ peers that it dials, and listens for disconnects from.

@brianmcmichael
Copy link

Geth has the "static nodes" feature which may be useful as a development/design pattern for go-ipfs.

https://github.com/ethereum/go-ethereum/wiki/Connecting-to-the-network#static-nodes

@raulk
Copy link
Member

raulk commented Mar 27, 2019

@whyrusleeping let's do that quickly. Connection manager v2 proposal is in the works, but there's no reason we cannot implement a protected set. I'll work on a patch.

@raulk
Copy link
Member

raulk commented Mar 27, 2019

@whyrusleeping

like "always", which attempts to reconnect on any disconnect (respecting backoff rules),

Do you think libp2p should take care of reestablishing the connection? This would require changes in the host and/or the swarm, e.g. when you host.Connect() you could specify a supervision policy for that connection.

"preferred" which never closed a connection to that peer, but doesnt necessarily try and hold open a connection,

See libp2p/go-libp2p-interface-connmgr#14 and libp2p/go-libp2p-connmgr#36.

and maybe one more that just increases the likelihood the conn manager will hold open the connection, but still allows it to be closed as needed.

In the current connection manager, this affinity can be achieved by setting a higher score on that connection via a separate tag, e.g. "peer_affinity".

@obo20
Copy link

obo20 commented Mar 28, 2019

@Stebalien The issue occurring in #6145 may or may not be relevant to this ticket

@lanzafame
Copy link
Contributor

It would be awesome if this could be exposed via an API as well as configuration as this would allow Cluster to dynamically protect connections between IPFS nodes as they join a cluster.

@raulk
Copy link
Member

raulk commented Mar 29, 2019

I've added the Protect()/Unprotect() API to the connection manager, available in gomod version v0.0.3.

Please take it out for a spin and report back.

You should be unblocked now to make progress with this; do shout out if you think otherwise.

@obo20
Copy link

obo20 commented Apr 1, 2019

@raulk @whyrusleeping How does this work in regards to spam protection for high profile nodes? For example, almost everybody would probably love to stay connected to the official ipfs.io gateway nodes if given the chance. However, it's obviously unfeasible for the official ipfs.io nodes to maintain connected to that many nodes all the time, which could result in an overwhelming amount of disconnects / attempts to reconnect.

Do the backoff rules may cover this edge case? I just wanted to double check that this doesn't accidentally bring your infrastructure to a standstill.

@raulk
Copy link
Member

raulk commented Apr 1, 2019

@obo20 dialer backoff rules wouldn't cover that case, as presumably the dials would succeed.

While I think it's legitimate for everybody to want to stay connected to ipfs.io, that's should not be the case and it's not the desired architecture. In other words: IPFS is not a hub-and-spoke nor federated model.

Gateways are able to discover across a decentralised network; if that proves dysfunctional, we should dig into that.

@raulk
Copy link
Member

raulk commented Apr 1, 2019

@obo20 from the viewpoint of a libp2p node, it's legitimate to strive to keep a connection alive with peer A if you consider it high-value. Peer A also has resource management in place, and will eventually prune connections it considers low-value. If many peers deem peer A as high-value, they will eventually compete for its resources. If the protocol manages reputation/scoring well (e.g. bitswap), peer A will retain the highest performing peers.

@obo20
Copy link

obo20 commented Apr 1, 2019

@raulk I may have miscommunicated my concern.

My worry is that if say for example, the ipfs.io gateway nodes have a high water of 2000 (I'm making this number up) and then 3000 other nodes on the network want to have a "protected" swarm connection to those nodes, how would this be handled?

@obo20
Copy link

obo20 commented Apr 1, 2019

@obo20 from the viewpoint of a libp2p node, it's legitimate to strive to keep a connection alive with peer A if you consider it high-value. Peer A also has resource management in place, and will eventually prune connections it considers low-value. If many peers deem peer A as high-value, they will eventually compete for its resources. If the protocol manages reputation/scoring well (e.g. bitswap), peer A will retain the highest performing peers.

So from my interpretation of this comment, the high profile nodes would just prune excess nodes even if every single one of those nodes had set it as "protected" ? This sounds good from the perspective of the high profile node.

How do the nodes who have deemed this connection "protected" act when they've been pruned? Do they attempt to frequently reconnect or do they just accept that they've been pruned and move on? (This may be more of an ipfs implementation question instead of a libp2p question.)

@raulk
Copy link
Member

raulk commented Apr 1, 2019

How do the nodes who have deemed this connection "protected" act when they've been pruned?

They just see the connection die. The application (e.g. IPFS) can then attempt to reconnect, and the other party will accept the connection and keep it alive until the connection manager prunes it again.

Note that Bitswap is not proactively managing reputation/scoring AFAIK. I'm sure a PR there would probably be well-received.

@Mikaela
Copy link
Contributor

Mikaela commented Apr 16, 2019

I noticed that this is partially in the changelog, but are these important connections remembered anywhere yet or is it still upcoming like in the original issue?

My usecase is having three nodes of which one is almost 24/7 and to avoid killing routers they have small connection limits and while they have each other as bootstrap nodes, after running for a moment they forget all about each other.

When I pin something on one, I likely also want to pin it on the others and that is slow unless I ipfs swarm connect by myself (I guess the changelog means that I will have to be running that less often). As they aren't all 24/7, I think the suggested "preferred" flag would fit my usecase due to the nodes connecting each other mainly through Yggdrasil network having static addresses within it.

@whyrusleeping
Copy link
Member Author

@raulk what @obo20 is pointing out is that if everyone decides to add the ipfs gateways to their protected connection set, the gateways will get DoSed with connections. What we need to prevent this is the 'disconnect' protocol so the gateways can politely ask peers to disconnect, and have those peers not immediately try to reconnect.

Sure, malicious peers can always ignore that, but we want normal well behaved peers to not accidentally DoS things.

@Mikaela
Copy link
Contributor

Mikaela commented Apr 16, 2019

if everyone decides to add the ipfs gateways to their protected connection set

Would there be any point in this or is this just fear of users not getting it or am I not getting it? I am not using IPFS.io gateway (but ipns.co), but if users request my content from IPFS.io a lot, won't it be fast due to cache anyway regardless of whether my node is currently connected to the gateways or not?

@whyrusleeping
Copy link
Member Author

is this just fear of users not getting it

Basically this, yeah. Network protections in systems like this shouldn't have to rely on clients behaving properly. Adding a disconnect protocol still relies on clients behaving properly, but its an additional step (circumventing the disconnect protocol would be deliberate, force connecting to the gateway nodes is more of a configuration mistake)

@Mikaela This feature isnt complete yet, its just the ephemeral important connections currently. Persistence should be coming soon (at this point I think all it takes is adding it to the config file and wiring that through)

@jbenet
Copy link
Member

jbenet commented Jun 29, 2019

proposing new commands

Single Peer -- Keep connections to a specific peers

Use list of peers to stay connected to all the time.

command name options (options to seed ideas -- i dont love any of these names :D)

# choose one
ipfs swarm bind    [list | add | rm]
ipfs swarm peer    [list | add | rm]
ipfs swarm link    [list | add | rm]
ipfs swarm bond    [list | add | rm]
ipfs swarm friend  [list | add | rm]
ipfs swarm tie     [list | add | rm]
ipfs swarm relate  [list | add | rm]
ipfs swarm couple  [list | add | rm]

subcommands

<cmd> list
<cmd> add [--policy=([always]|protect|...)] [ <peer-id> | <multiaddr> ]
<cmd> rm [ <peer-id> | <multiaddr> ]

examples

# just w/ p2p. (use a libp2p peer-routing lookup to find addresses)
ipfs swarm bind add /p2p/Qmbwqf292G3GbrNm1ydtKeqhqqgyqXDtDvsuBYuvXsPHHr

# connect specifically to this address
ipfs swarm bind add /ip4/127.0.0.1/udp/4001/quic/p2p/Qmbwqf292G3GbrNm1ydtKeqhqqgyqXDtDvsuBYuvXsPHHr

# can combine both, to try the address but also lookup addresses in case this one doesn't work.
ipfs swarm bind add /p2p/Qmbwqf292G3GbrNm1ydtKeqhqqgyqXDtDvsuBYuvXsPHHr
ipfs swarm bind add /ip4/127.0.0.1/udp/4001/quic/p2p/Qmbwqf292G3GbrNm1ydtKeqhqqgyqXDtDvsuBYuvXsPHHr

# always keep a connection open always (periodically check, dial/re-dial if disconnected)
ipfs swarm bind add /p2p/Qmbwqf292G3GbrNm1ydtKeqhqqgyqXDtDvsuBYuvXsPHHr
ipfs swarm bind add --policy=always /p2p/Qmbwqf292G3GbrNm1ydtKeqhqqgyqXDtDvsuBYuvXsPHHr

# once opened, keep a connection open (try to keep it open, but don't re-dial)
ipfs swarm bind add --policy=protect /p2p/Qmbwqf292G3GbrNm1ydtKeqhqqgyqXDtDvsuBYuvXsPHHr

Peer Group -- Keep connections to a (changing) group of peers

  • Use a group key to find each other and stay connected. Connect to every peer in the group. Keep a list of groups.
  • Maybe use a pre-shared key (PSK) to join the group and find out about each other (that way we can have private groups)

command name options

# choose one
ipfs swarm group   [list | add | rm]
ipfs swarm party   [list | add | rm]
ipfs swarm clique  [list | add | rm]
ipfs swarm flock   [list | add | rm]

subcommands

<cmd> list
<cmd> add [--mode=(all|any|number|...)] [ <group-key> ]
<cmd> rm [ <group-key> ]

examples

ipfs swarm group add --mode=all <secret-key-for-ipfs-gateways>
ipfs swarm group add --mode=all <secret-key-for-pinbot-cluster>
ipfs swarm group add --mode=any <secret-key-for-dtube-gateways>
ipfs swarm group add --mode=any <secret-key-for-pinata-peers>
ipfs swarm group add --mode=all <secret-key-for-textile-peers>

@obo20
Copy link

obo20 commented Jul 3, 2019

I'm definitely a fan of the functionality that @jbenet is suggesting. While 'ipfs swarm connect' currently gets the job done, it would be nice to have more fine tuned control over how to manage connections that we want to keep alive.

The swarm groups are an interesting concept. Instead of a secret key to manage access, I'd love it if there was a concept of "roles" to determine access to the groups. Essentially, the first node to set a group up is an admin role, and then can add either admins or members from there.

The benefit here is that the owner(s) of a group can add / revoke nodes if needed without needing to completely reform the entire group since there's no secret that acts as a master password.

Another thing that would be incredibly helpful here is if this type of stuff could be added to a permanent config, instead of being only temporary until the node restarts. Currently we (and I believe the IPFS infrastructure team - according to @mburns) use the default 'ipfs swarm connect' functionality to keep our nodes connected and we have to continually connect our nodes on a repeating cron-task so that if the node restarts we can reconnect them. Having something like this persist between reboots would be incredibly valuable.

@hsanjuan
Copy link
Contributor

A workaround is to add gateways to the bootstrap list. Bootstrap nodes are re-connected to frequently (I'm not sure if they are also "protected" or tagged with higher priority).

@Stebalien
Copy link
Member

A workaround is to add gateways to the bootstrap list. Bootstrap nodes are re-connected to frequently (I'm not sure if they are also "protected" or tagged with higher priority).

Only if the number of open connections drops below 4. They also aren't tagged with any high priority (as a matter of fact, we've considered tagging them with a negative priority).

@obo20 obo20 mentioned this issue Aug 23, 2019
@olizilla
Copy link
Member

olizilla commented Sep 2, 2019

Can we make ipfs swarm connect call Protect on the connection by default? If I explicilty ask my node to connect to another, I don't want the connection to be in the list of trimmable connections, I want it to stay connected. I can't guarantee the other side wont drop it, but I definitly dont want my side to drop it. I can ipfs swarm disconnect to signal that I'm done with it.

Adding a mechanism to "reconnect on close" requires us to solve the "dont ddos popular nodes" problem, but exposing the existing libp2p logic to let users identify connections that they dont want their node to trim seems much less risky, and would allow users that control groups of nodes to maintain connections between them all by connecting from both sides. It's doesn't solve the auto-reconnect problem, but that can be scripted for now.

@obo20
Copy link

obo20 commented Sep 2, 2019

@olizilla Would there be a way to swarm connect without protecting the connection?

@Stebalien
Copy link
Member

@olizilla we currently add a weight of 100 (didn't have connection protection at the time). But yeah, we should probably protect those connections and add a --protect=false flag.

@obo20
Copy link

obo20 commented Dec 4, 2019

Is this functionality being considered at all for the 0.5 release?

@Stebalien
Copy link
Member

No. However, there are a few improvements already in master that may help:

  • The connection manager will no longer count connections in the grace period towards the connection limit.
    • Pro: Useful connections won't be trimmed in favor of new connections.
    • Con: You may end up with more connections and may need to reduce your limit.
  • Bitswap keeps track of historically useful peers and tells the connection manager to avoid disconnecting from these peers.

@obo20
Copy link

obo20 commented Dec 4, 2019

@Stebalien Does this bitswap history persist through resets or does it live in memory?

If not, would it be difficult to have a separate bootstrap list (or just something in the config that we can set) that consists of peers which we don't ever want to prune connections for? Upon node initiation the node would add all nodes in that list to that "historically useful peers list" you mentioned.

For context, my main goal here is to avoid having to periodically run outside scripts to manage my node connections as this has been somewhat unreliable.

@Stebalien
Copy link
Member

No, it lives in memory. The end goal is to also have something like this issue implemented, just not right now.

Stebalien added a commit that referenced this issue May 26, 2020
MVP for #6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
Stebalien added a commit that referenced this issue May 26, 2020
MVP for #6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
Stebalien added a commit that referenced this issue May 26, 2020
MVP for #6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
Stebalien added a commit that referenced this issue May 26, 2020
MVP for #6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
Stebalien added a commit that referenced this issue May 26, 2020
MVP for #6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 6, 2020
MVP for ipfs#6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 6, 2020
MVP for ipfs#6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
MVP for ipfs#6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
MVP for ipfs#6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
MVP for ipfs#6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020
MVP for ipfs#6097

This feature will repeatedly reconnect (with a randomized exponential backoff)
to peers in a set of "peered" peers.

In the future, this should be extended to:

1. Include a CLI for modifying this list at runtime.
2. Include additional options for peers we want to _protect_ but not connect to.
3. Allow configuring timeouts, backoff, etc.
4. Allow groups? Possibly through textile threads.
5. Allow for runtime-only peering rules.
6. Different reconnect policies.

But this MVP should be a significant step forward.
@Winterhuman
Copy link
Contributor

@Stebalien Is there anything keeping this issue open now that Peering has been added to go-ipfs?

@lidel
Copy link
Member

lidel commented Apr 5, 2022

I think Peering + #8680 (allows setting limits per peer) cover the technical gist of this issue.

Remaining work is to add some porcelain on top of it, like commands proposed in #6097 (comment)

@sinkingsugar
Copy link

sinkingsugar commented Jun 12, 2022

Peering barely works, even after explicitly doing swarm connect, few seconds after they are culled away if they perform badly.
Why not just use https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md#explicit-peering-agreements behind the scenes? So many use cases, surprised about this being so fragile still 😄

Worth mentioning that while with 0.13 nothing seems to work when trying to connect to a simple bitswap protocol (substrate node).
0.12 seems to be able to keep the connection fine.
There might be some regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/connection-manager Issues related to Swarm.ConnMgr (connection manager)
Projects
None yet
Development

No branches or pull requests