Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2P reconnection issue #2105

Closed
3 of 17 tasks
abitmore opened this issue Feb 11, 2020 · 3 comments · Fixed by #2107
Closed
3 of 17 tasks

P2P reconnection issue #2105

abitmore opened this issue Feb 11, 2020 · 3 comments · Fixed by #2107
Labels
6 P2P Impact flag identifying the peer-to-peer (P2P) layer

Comments

@abitmore
Copy link
Member

abitmore commented Feb 11, 2020

Bug Description

I have a node which has been running normally for months, but now it has been stuck for about 6 hours. The node is behind a firewall, its P2P port is not open to the public. The network of the node is not very stable recently.

From p2p.log, it seems the node is endlessly trying to connect to peers one by one, around 7200 times a hour, but none of them is reachable so far.

  • Many of the peers it tried to connect have the same IP address but different ports, which is a bit suspicious. I guess the reason is that the peers have been restarted for some times, each time with a new port. And probably there were attacks.
  • It seems there are too many dead peers in the peer database of my node after the long run. Total amount of peers in p2p.log that my node has tried to connect is 4933. Around 10 of them have been tried for 33 times, others are 32 times or fewer.
  • It seems the priorities of the default seed nodes (which are working) have became too low, thus my node has never tried to connect to any of them during the past 6 hours, or the default seed nodes are no longer in the peer database. In the logs I found that the node had tried to connect to some default seed nodes that have been removed from the develop branch, but had never tried to connect to the default seed nodes that are still working.
    • Side note: some default seed nodes are domain names, but the domain names are only get resolved to IP addresses when starting the node. It means if the IP address of a domain name changed, the node won't try to connect to the new IP address. This is not the case in this issue though.

Impacts
Describe which portion(s) of BitShares Core may be impacted by this bug. Please tick at least one box.

  • API (the application programming interface)
  • Build (the build process or something prior to compiled code)
  • CLI (the command line wallet)
  • Deployment (the deployment process after building such as Docker, Travis, etc.)
  • DEX (the Decentralized EXchange, market engine, etc.)
  • P2P (the peer-to-peer network for transaction/block propagation)
  • Performance (system or user efficiency, etc.)
  • Protocol (the blockchain logic, consensus, validation, etc.)
  • Security (the security of system or user data, etc.)
  • UX (the User Experience)
  • Other (please add below)

CORE TEAM TASK LIST

  • Evaluate / Prioritize Bug Report
  • Refine User Stories / Requirements
  • Define Test Cases
  • Design / Develop Solution
  • Perform QA/Testing
  • Update Documentation
@abitmore abitmore added the 6 P2P Impact flag identifying the peer-to-peer (P2P) layer label Feb 11, 2020
@abitmore abitmore added this to the Future Feature Release milestone Feb 11, 2020
@abitmore
Copy link
Member Author

The same node (after restarted 5 days ago) got stuck again today. When checking the logs I found some data.

Total amount of peers in p2p.log that my node has tried to connect is 4933

Among the 4933 peers there are around 50 unique IP addresses only.

Around 10 of them (the peers) have been tried for 33 times

The top 10 unique IP addresses got tried more than 2700 times each. The No. 1 is 6260 times.

@abitmore
Copy link
Member Author

Apparently the peer database is polluted when the node wrongly assumes the peers "are not firewalled".

Comment in the code:

// whether we're planning on accepting them as a peer or not, they seem to be a valid node,
// so add them to our database if they're not firewalled
// in the hello message, the peer sent us the IP address and port it thought it was connecting from.
// If they match the IP and port we see, we assume that they're actually on the internet and they're not
// firewalled.

// whether we're planning on accepting them as a peer or not, they seem to be a valid node,
// so add them to our database if they're not firewalled

// in the hello message, the peer sent us the IP address and port it thought it was connecting from.
// If they match the IP and port we see, we assume that they're actually on the internet and they're not
// firewalled.

That assumption makes sense only if the peer is running a witness_node (our software) or compatible software which will tell us what the real inbound port is, and the port doesn't change frequently. Even that, the inbound port could be blocked by a firewall. Scenarios that our peer database can get polluted: an incompatible peer could tell us that the outbound port is same as the inbound port which is unreachable; a malicious peer could tell us a random unreachable port.

The polluted peer database is then propagated to other peers and pollutes their peer databases.

When the node is disconnected, it may get stuck in the p2p_network_connect_loop unable to find an available peer:

for (peer_database::iterator iter = _potential_peer_db.begin();
iter != _potential_peer_db.end() && is_wanting_new_connections();
++iter)
{
fc::microseconds delay_until_retry = fc::seconds((iter->number_of_failed_connection_attempts + 1) * _peer_connection_retry_timeout);
if (!is_connection_to_endpoint_in_progress(iter->endpoint) &&
((iter->last_connection_disposition != last_connection_failed &&
iter->last_connection_disposition != last_connection_rejected &&
iter->last_connection_disposition != last_connection_handshaking_failed) ||
(fc::time_point::now() - iter->last_connection_attempt_time) > delay_until_retry))
{
connect_to_endpoint(iter->endpoint);
initiated_connection_this_pass = true;
}
}

@abitmore
Copy link
Member Author

Fixed by #2107.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6 P2P Impact flag identifying the peer-to-peer (P2P) layer
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant