Skip to content
This repository has been archived by the owner on Oct 28, 2021. It is now read-only.

Avoid attempting to sync with disconnected peers #5644

Merged
merged 3 commits into from
Jul 3, 2019

Conversation

halfalicious
Copy link
Contributor

@halfalicious halfalicious commented Jul 1, 2019

Fix #5643

Fix a bug where Aleth can try to sync with a node after it has failed status validation (e.g. its on a different chain), which results in wasted resources and confusing logs e.g: "starting full sync" being displayed but no blocks being imported.

The root cause is that Aleth can initiate syncing with peers via Host::forEachPeer (see BlockChainSync::continueSync), and Host::forEachPeer iterates through its map of Session weak_ptrs without checking if the session is still connected i.e. the socket isn't closed. This means that the following case can occur:

  1. Status packet validation fails (see BlockChainSync::onPeerStatus) so Aleth sends a disconnect packet to the remote node (Host::capabilityHost()::disconnect()). This results inSession::disconnect()being called which results in a write to the socket being queued and the write handler being placed in the boost asio handler queue (see Session::write). The socket is then closed. Note that the Session instance is still active until the write is completed and the handler is executed (Session::shared_from_this() is captured in the handler).

  2. Something happens which triggers a call to BlockChainSync::continueSync (e.g. a peer session with another node is closed - see BlockChainSync::onPeerAborting) - BlockChainSync::continueSync makes sync calls for each peer via host().capabilityHost().forEachPeer(). Meanwhile the write handler is still sitting in the boost ASIO handler queue so the session is still active.

  3. Eventually the write is performed and the write handler is removed from the queue and executed and the Session instance is destructed. This means that the attempted syncing with the disconnected node is wasted cycles and also makes the logs confusing.

There are a few places where we retrieve a session (stored in Host::m_sessions) but we don't check if the session is connected before using the session ptr. We need to do this because sessions are kept alive by shared_ptrs (shared_from_this) passed to boost ASIO handlers, and there are conditions which can occur where a session shared_ptr is still valid but the session has been disconnected. For example, take the case of if we've requested a Session disconnect (Session::disconnect) but the handler which executes after the socket write occurs (see Session::write) is sitting in the boost ASIO handler queue.
@codecov-io
Copy link

codecov-io commented Jul 1, 2019

Codecov Report

Merging #5644 into master will decrease coverage by 0.01%.
The diff coverage is 42.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5644      +/-   ##
==========================================
- Coverage   62.74%   62.73%   -0.02%     
==========================================
  Files         348      348              
  Lines       29682    29689       +7     
  Branches     3343     3345       +2     
==========================================
  Hits        18625    18625              
- Misses       9845     9848       +3     
- Partials     1212     1216       +4

@gumb0
Copy link
Member

gumb0 commented Jul 1, 2019

Is this ready?

@halfalicious halfalicious changed the title [WIP] Check if session is connected after obtaining shared_ptr Check if session is connected after obtaining shared_ptr Jul 2, 2019
@halfalicious halfalicious requested review from chfast and gumb0 and removed request for chfast July 2, 2019 02:55
@halfalicious
Copy link
Contributor Author

Is this ready?

@gumb0 Yup! 😄

@halfalicious halfalicious changed the title Check if session is connected after obtaining shared_ptr Avoid attempting to sync with disconnected peers Jul 2, 2019
libp2p/Host.cpp Outdated Show resolved Hide resolved
libp2p/Host.cpp Outdated
if (s && s->isConnected())
return s;
}
return{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, still need to get git clang-format up and running again after a reinstall, will reformat manually for the time-being.

libp2p/Host.cpp Outdated
@@ -1155,13 +1174,16 @@ void Host::forEachPeer(
RecursiveGuard l(x_sessions);
vector<shared_ptr<SessionFace>> sessions;
for (auto const& i : m_sessions)
if (shared_ptr<SessionFace> s = i.second.lock())
{
shared_ptr<SessionFace> s = i.second.lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use auto as in other cases?

Remove redundant search, fix formatting, use auto consistently
@halfalicious halfalicious merged commit c630593 into master Jul 3, 2019
@halfalicious halfalicious deleted the session-isconnected branch July 3, 2019 02:56
@halfalicious
Copy link
Contributor Author

@gumb0 / @chfast Thanks for the review!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aleth tries to sync with peer after status verification fails
4 participants