Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DHT tests failing because of repeated addresses #2855

Merged
merged 1 commit into from
Jun 15, 2016

Conversation

Kubuxu
Copy link
Member

@Kubuxu Kubuxu commented Jun 14, 2016

Due to SO_REUSEPORT it is possible for a localhost:0
address to repeat. This causes failure in DHT tests
where we spun up a lot of nodes inside test.

As for a birthday paradox it is enough to use 140
ports to get 20% chance for collision which was causing
failure in our case.

The fix is to disable REUSE_PORT routine for the
tests and leave it running for sharness tests where
not that many addresses are used at the same time.

Resolves #2745 #2809 #188

Due to SO_REUSE_PORT it is possible for a localhost:0
address to repeat. This causes failure in DHT tests
where we spun up a lot of nodes inside test.

As for a birthday paradox it is enough to use 140
ports to get 20% chance for collision which was causing
failure in our case.

The fix is to disable REUSE_PORT routine for the
tests and leave it running for sharness tests where
not that many addresses are used at the same time.

License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protonmail.ch>
@Kubuxu
Copy link
Member Author

Kubuxu commented Jun 14, 2016

tests_green.jpg

@whyrusleeping
Copy link
Member

Awesome job catching this, yet another SO_REUSEPORT issue... yay

@whyrusleeping whyrusleeping merged commit 323441d into master Jun 15, 2016
@whyrusleeping whyrusleeping deleted the feature/test-repeated-ip branch June 15, 2016 03:16
@whyrusleeping whyrusleeping added this to the Ipfs 0.4.3 milestone Jun 21, 2016
@jbenet
Copy link
Member

jbenet commented Aug 27, 2016

  • Great find!
  • We should continue to have SO_REUSEPORT being stress tested because we may find issues with it (maybe it behaves differently or poorly). If not in these tests, maybe in others? Or would there be another way of signaling ports in the tests here?


test_sharness_expensive:
cd test/sharness/ && TEST_EXPENSIVE=1 make
TEST_EXPENSIVE=1 make -C test/sharness/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriscool was there a reason to do the make after a cd? pls confirm this change is perfectly fine with you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's fine. One could also use make -C test/sharness/ TEST_EXPENSIVE=1. It's a matter of preference whether one wants the shell or make to do the cd and to set the environment variable.

@jbenet
Copy link
Member

jbenet commented Aug 27, 2016

Hold up, this disabled SO_REUSEPORT in ALL the go tests. this is pretty bad and flying blind. if there's any more trouble with SO_REUSEPORT we'll find it with stress testing all the socket use, which the go-ipfs suite does pretty well. Obviously the "accidentally reuse port" problem really sucks... is there another way of going about this? Maybe in a more granular way? (in the failing tests, or in tests which allocate tons of ports). Of course, if we can do a different way of allocating ports even better.

@Kubuxu
Copy link
Member Author

Kubuxu commented Aug 27, 2016

Some smarter way of allocating ports would be the best solution, unfortunately it is hard to do without race of conditions (I don't know if there is a way at all to do it 100% correctly).

The go-reuseport is very deep in the stack so disabling it only locally would require some refactors, that is why I decided to disable it globally.

Also any test that would stress go-reuseport would have quite a high change of triggering the collision.

I am all for re-enabling the go-reuseport in tests but we need to work on smart port selection first (0.0.0.0/0 is just not working with SO_REUSEPORT set).

@jbenet jbenet mentioned this pull request Aug 28, 2016
58 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestBootstrap fails occasionally
4 participants