Skip to content
This repository has been archived by the owner on Oct 28, 2021. It is now read-only.

Garbage collect incompatible peers in Host::run() #5624

Merged
merged 7 commits into from
Jun 18, 2019
Merged

Conversation

halfalicious
Copy link
Contributor

@halfalicious halfalicious commented Jun 12, 2019

Garbage collect peers in Host::run() which are deemed incompatible - an incompatible peer is one with which we will never be able to successfully peer with (e.g. on a different chain or network, not running any capabilities).

@halfalicious halfalicious changed the title Garbage collect incompatible peers in Host::run() [WIP] Garbage collect incompatible peers in Host::run() Jun 12, 2019
@halfalicious halfalicious self-assigned this Jun 12, 2019
halfalicious and others added 2 commits June 11, 2019 22:39
Garbage collect peers in Host::run() which are deemed incompatible - an incompatible peer is one with which we will never be able to connect to successfully peer with (e.g. on a different chain or network, not running any capabilities).
libp2p/Host.cpp Outdated Show resolved Hide resolved
libp2p/Peer.cpp Outdated
case IncompatibleProtocol:
case UnexpectedIdentity:
case UserReason:
return numeric_limits<unsigned>::max();
case TooManyPeers:
return 25 * (m_failedAttempts + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_failedAttempts seems to affect only the value of fallbackSeconds currently. Maybe we should use it now to retry several times and then go to "critical error, disconnect" state.
(at least for some cases of failures)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 Good idea - what do you think of this being taken care of in another PR? I'd like to limit the amount of changes I make to the peer gc logic in this PR so if something ends up breaking it will be easier to debug.

Copy link
Member

@gumb0 gumb0 Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for another PR, but it seems to be the matter of only adding condition like m_failedAttempts >= 20 to uselessPeer function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional thought is that we could make all this change more conservative if we leave fallbackSeconds() as it was before (so that we do the same reconnects as before) and have just this check for failed attempts count in uselessPeer (plus the new check for handshake failures)
This way it would reconnect with the same intervals for each case as before, but stop after limited number of attempts.

But I'm fine with it if you think it's better to immediately stop in some cases.

libp2p/Host.cpp Outdated Show resolved Hide resolved
Rather than use a magic number to indicate a useless peer (numeric_limits<unsigned>::max()), add a function to Peer (Peer::uselessPeer) which can be called to detect this. Also update Peer::fallbackSeconds() to return a number which is large enough to prevent us from realistically ever connecting to a useless peer but which is small enough to not overflow when added to Peer::m_lastAttempted.

Also, log a message with error verbosity in Host::handshakeFailed() if a peer is not found rather than asserting, since while we never expect users to hit this case, the EIP8 handshake tests can.
Update Peer::uselessPeer() to check peer type and return false if it's a required peer, since in this case the user obviously wants to try to stay connected to this peer.
@codecov-io
Copy link

codecov-io commented Jun 14, 2019

Codecov Report

Merging #5624 into master will increase coverage by <.01%.
The diff coverage is 59.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5624      +/-   ##
==========================================
+ Coverage   62.62%   62.63%   +<.01%     
==========================================
  Files         350      350              
  Lines       29690    29742      +52     
  Branches     3344     3350       +6     
==========================================
+ Hits        18593    18628      +35     
- Misses       9889     9901      +12     
- Partials     1208     1213       +5

@halfalicious halfalicious changed the title [WIP] Garbage collect incompatible peers in Host::run() Garbage collect incompatible peers in Host::run() Jun 14, 2019
Copy link
Member

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments and a small bug in the loop iterating over peers.

libp2p/RLPxHandshake.h Show resolved Hide resolved
libp2p/RLPxHandshake.cpp Outdated Show resolved Hide resolved
libp2p/RLPxHandshake.cpp Outdated Show resolved Hide resolved
libp2p/Host.h Outdated
std::shared_ptr<Peer> peer(NodeID const& _n) const;

/// Set a handshake failure reason for a peer
void handshakeFailed(NodeID const& _n, HandshakeFailureReason _r);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better onHandshakeFailed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I would make it public and declare close to startPeerSession

Copy link
Contributor Author

@halfalicious halfalicious Jun 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 : Why make this public, does it make sense to expose the concept of a handshake to consumers of Host?

Copy link
Member

@gumb0 gumb0 Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it looks to me like a callback similar to startPeerSession, the callback called by RLPXHandshake when handshake is finished. One is for success another one is for failure.

It works as private, because RLPXHandshake is a friend of Host, but idealluy we should get rid of this friend declarations at some point.

Exposing it to the clients of Host is of course not great, but the proper way to deal with it could be to create a separate interface with these callbacks only, don't expose it to Host clients, but pass it only to RLPXHandshake. That's a bit complicated change, at least it's not for this PR.

(In other words, we won't make it much worse, because startPeerSession is already public)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it looks to me like a callback similar to startPeerSession, the callback called by RLPXHandshake when handshake is finished. One is for success another one is for failure.

It works as private, because RLPXHandshake is a friend of Host, but idealluy we should get rid of this friend declarations at some point.

Exposing it to the clients of Host is of course not great, but the proper way to deal with it could be to create a separate interface with these callbacks only, don't expose it to Host clients, but pass it only to RLPXHandshake. That's a bit complicated change, at least it's not for this PR.

(In other words, we won't make it much worse, because startPeerSession is already public)

Ah that makes sense, thank you for clarifying! 😄 I'll make the change before merging.

libp2p/Host.cpp Outdated Show resolved Hide resolved
libp2p/Peer.h Outdated Show resolved Hide resolved
libp2p/Peer.cpp Outdated
case UselessPeer:
case IncompatibleProtocol:
case UnexpectedIdentity:
case UserReason:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about UserReason - this could in theory be any kind of reason specific to subprotocol, including some temporary reasons, maybe it makes sense to try to reconnect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to treat UserReason as terminal because we only use it in 2 cases - if status validation fails:

std::string disconnectReason;
if (peerSessionInfo->clientVersion.find("/v0.7.0/") != string::npos)
disconnectReason = "Blacklisted client version.";
else
disconnectReason = _peer.validate(
host().chain().genesisHash(), host().protocolVersion(), host().networkId());
if (!disconnectReason.empty())
{
LOG(m_logger) << "Peer " << _peer.id() << " not suitable for sync: " << disconnectReason;
m_host.capabilityHost().disconnect(_peer.id(), p2p::UserReason);
return;
}

Here's where we perform the actual validation in EthereumPeer::validate:
if (m_networkId != _hostNetworkId)
error << "Network identifier mismatch. Host network id: " << _hostNetworkId
<< ", peer network id: " << m_networkId;
else if (m_protocolVersion != _hostProtocolVersion)
error << "Protocol version mismatch. Host protocol version: " << _hostProtocolVersion
<< ", peer protocol version: " << m_protocolVersion;
else if (m_genesisHash != _hostGenesisHash)
error << "Genesis hash mismatch. Host genesis hash: " << _hostGenesisHash.abridged()
<< ", peer genesis hash: " << m_genesisHash.abridged();
else if (m_asking != Asking::State && m_asking != Asking::Nothing)
error << "Peer banned for unexpected status message.";

  • If there's a bug in our network code and Session read validation fails:

    aleth/libp2p/Session.cpp

    Lines 409 to 418 in 505aead

    else if (_length != _expected)
    {
    // with static m_data-sized buffer this shouldn't happen unless there's a regression
    // sec recommends checking anyways (instead of assert)
    LOG(m_netLoggerError)
    << "Error reading - TCP read buffer length differs from expected frame size ("
    << _length << " != " << _expected << ")";
    disconnect(UserReason);
    return false;
    }

You bring up a good point though, technically it can be any reason specific to a subprotocol and other clients can also choose to send it to us for reasons specific to their implementation. I think that rather than treating it as immediately critical we should use it in some sort of reconnection count threshold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 I've decided to change UserReason -> UselessPeer when status message validation fails for a peer (since if that happens the peer is effectively useless to us i.e. we can't sync with it). I've also added in logic in Peer::isUseless to take the number of failed connection attempts into account when the last disconnect isn't critical.

libp2p/Host.cpp Show resolved Hide resolved
libp2p/Host.cpp Outdated Show resolved Hide resolved
libp2p/Host.cpp Outdated Show resolved Hide resolved
libp2p/Peer.h Outdated Show resolved Hide resolved
Status validation failing means we have no chance of syncing with the peer (e.g. it's running on a different network) so a UselessPeer disconnect reason makes more sense than UserReason. This also has the benefit of status failure validation being treated as a critical disconnect and these nodes being gc'd in Host::run.
Copy link
Member

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, feel free to make onHandshakeFailed public, if you agree with my reasoning, but it's not critical

Various minor changes in RLPxHandshake, Host, and Peer classes:
* Initialize RLPxHandshake::m_failureReason in ctor
* Reduce redundant code in RLPxHandshake which sets the last failure reason when TCP errors occur
* Rename Host::handshakeFailed
* Fix m_peers iteration bug in Host
* Make Host::onHandshakeFailed public (we want to avoid using friend classes where possible)
* Rename Peers::uselessPeer
* Update deprecated comment in Peer
* Take # of failed connection attempts into account in Peer function which determines if instance is useless or not
Move fallback seconds computation for default case to anonymous namespace function
@halfalicious halfalicious merged commit b9c49ba into master Jun 18, 2019
@halfalicious halfalicious deleted the gc-peers branch June 18, 2019 03:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants