Skip to content
This repository has been archived by the owner on Oct 28, 2021. It is now read-only.

Fix unreliable network/p2p tests #5544

Closed
halfalicious opened this issue Apr 3, 2019 · 6 comments
Closed

Fix unreliable network/p2p tests #5544

halfalicious opened this issue Apr 3, 2019 · 6 comments

Comments

@halfalicious
Copy link
Contributor

halfalicious commented Apr 3, 2019

I've seen the following tests fail recently in PRs which didn't touch network or p2p code (e.g. #5541)

  • network/net/evictionWithOldNodeAnswering
  • network/net/packetsWithChangedEndpointSuite/neighbours
  • libp2p/p2pPeer/requirePeer

Logs:

evictionWithOldNodeAnswering

495/563 Test #501: network/net/evictionWithOldNodeAnswering ..............................................................................***Failed    0.41 sec
Running tests using path: "/home/builder/project/test/jsontests"
Running 1 test case...
Test Case "evictionWithOldNodeAnswering": 
/home/builder/project/test/unittests/libp2p/net.cpp(854): fatal error: in "network/net/evictionWithOldNodeAnswering": critical check populated has failed
*** 1 failure is detected (5 failures are expected) in the test module "Master Test Suite"

packetsWithChangedEndpointSuite/neighbours

515/563 Test #510: network/net/packetsWithChangedEndpointSuite/neighbours ................................................................***Failed   10.37 sec
Running tests using path: "/home/builder/project/test/jsontests"
Running 1 test case...
Test Case "neighbours": 
/home/builder/project/libdevcore/concurrent_queue.h(47): fatal error: in "_T dev::concurrent_queue<_T, _QueueT>::pop(const milliseconds&) [with _T = std::vector<unsigned char>; _QueueT = std::queue<std::vector<unsigned char>, std::deque<std::vector<unsigned char>, std::allocator<std::vector<unsigned char> > > >; std::chrono::milliseconds = std::chrono::duration<long int, std::ratio<1l, 1000l> >]": boost::exception_detail::clone_impl<dev::WaitTimeout>: /home/builder/project/libdevcore/concurrent_queue.h(47): Throw in function _T dev::concurrent_queue<_T, _QueueT>::pop(const milliseconds&) [with _T = std::vector<unsigned char>; _QueueT = std::queue<std::vector<unsigned char>, std::deque<std::vector<unsigned char>, std::allocator<std::vector<unsigned char> > > >; std::chrono::milliseconds = std::chrono::duration<long int, std::ratio<1l, 1000l> >]
Dynamic exception type: boost::exception_detail::clone_impl<dev::WaitTimeout>
/home/builder/project/test/unittests/libp2p/net.cpp(1253): last checkpoint: "neighbours" test entry
*** 1 failure is detected (5 failures are expected) in the test module "Master Test Suite"

libp2p/p2pPeer/requirePeer

513/563 Test #518: libp2p/p2pPeer/requirePeer ............................................................................................***Failed    0.35 sec
Running tests using path: "/Users/distiller/project/test/jsontests"
Running 1 test case...
Test Case "requirePeer": 
/Users/distiller/project/test/unittests/libp2p/peer.cpp:283: fatal error: in "libp2p/p2pPeer/requirePeer": critical check host1peerCount == 1 has failed [0 != 1]

*** 1 failure is detected (5 failures are expected) in the test module "Master Test Suite"
@halfalicious
Copy link
Contributor Author

cc @gumb0

@halfalicious halfalicious changed the title Fix unreliable discovery tests Fix unreliable network/p2p tests Apr 5, 2019
@gumb0
Copy link
Member

gumb0 commented Apr 25, 2019

I did some debugging, and failure in packetsWithChangedEndpointSuite/neighbours seems to actually uncover a bug in NodeTable::nearestNodeEntries method implementation. It is supposed to return 16 nearest nodes to the given one, but always when distance(m_hostNodeID, _target) == 254 it returns empty list even if there's one node in the node table (and then FindNode is not sent there)

It's not easy to reproduce, because NodeIDs are randomly generated, so you need to be lucky to get this distance.

It's difficult to make sense of the code written there, but I think I figured out what it was supposed to do and I'll try to rewrite it.

(I noticed also net/pingNotSentAfterPongForKnownNode sometimes failing with similar symptoms)

@gumb0
Copy link
Member

gumb0 commented Apr 25, 2019

evictionWithOldNodeAnswering failure probably can be solved by increasing the number here

TestNodeTableHost nodeTableHost(1000);

(that failure means there are not enough random nodes to fill the required bucket)

@gumb0
Copy link
Member

gumb0 commented May 2, 2019

evictionWithOldNodeAnswering and packetsWithChangedEndpointSuite/neighbours should be fixed in #5581

@halfalicious
Copy link
Contributor Author

Haven't seen failures in these tests for a while so I'm assuming all issues have been fixed, will reopen if this isn't the case.

@gumb0
Copy link
Member

gumb0 commented Jul 22, 2019

There are others failing quite often though, I'll create another issue

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants