Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TESTS: Set SO_LINGER = 0 for MockNioTransport #32560

Merged
merged 7 commits into from
Sep 19, 2018

Conversation

original-brownbear
Copy link
Member

* Prevents lingering sockets in TIME_WAIT piling up during test runs and leading to port collisions that manifest as timeouts
* Fixes elastic#32552
@original-brownbear original-brownbear added :Distributed/Network Http and internode communication implementations >test-failure Triaged test failures from CI v7.0.0 labels Aug 1, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@original-brownbear
Copy link
Member Author

original-brownbear commented Aug 1, 2018

@tbrooks8 maybe take a look here when you have a sec. I think this is just a case of lingering connections piling up. If I run those tests without this change end up with 300+ lingering connections, with this change it's 4.
=> I used your suggestion from here #30876 (comment) (set SO_LINGER to 0 outright). I think that's fine and not really controversial for the test implementation right? As far as I can see these mock client connections only connect to mock server channels => if those RSTs ever cause trouble which they probably won't we can fix the server side to work with that kind of close I guess? (=> we're much more flexible here than with the prod. implementations)

@original-brownbear
Copy link
Member Author

original-brownbear commented Sep 17, 2018

@jasontedor @tbrooks8 can one of you take a look here (this is the one we discussed during the last test triage meeting).

@Tim-Brooks
Copy link
Contributor

Can we do this for the MockTcpTransport too? I'm not sure why we would do it for only the test nio transport and not the blocking test transport.

Additionally, are we still have the random timeout issue? A while back I merged #32620 which reduced the localhost connections used from 26 to 6 for the mock nio transport.

@original-brownbear
Copy link
Member Author

@tbrooks8 sure the blocking implementation could be changed as well, will do in a bit :)

Additionally, are we still have the random timeout issue? A while back I merged #32620 which reduced the localhost connections used from 26 to 6 for the mock nio transport.

Fair point, I guess the issue is pretty much less likely by a factor of 6/26 now. This change basically makes the factor 0 though which is really nice for ruling this issue out once and for all when seeing flaky timeouts in tests (+ it's really nice for tracking down other instabilities by running tests in a hot loop, that still fails after n loops because the TIME_WAIT sockets keep piling up).

@original-brownbear
Copy link
Member Author

@tbrooks8 done, turned off linger for the blocking mock transport as well.

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

@tbrooks8 thanks!

@original-brownbear original-brownbear merged commit 0cf0d73 into elastic:master Sep 19, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 19, 2018
* master: (46 commits)
  Fixing assertions in integration test (elastic#33833)
  [CCR] Rename idle_shard_retry_delay to poll_timout in auto follow patterns (elastic#33821)
  HLRC: Delete ML calendar (elastic#33775)
  Move DocsStats into Engine (elastic#33835)
  [Docs] Clarify accessing Date methods in painless (elastic#33560)
  add elasticsearch-shard tool (elastic#32281)
  Cut over to unwrap segment reader (elastic#33843)
  SQL: Fix issue with options for QUERY() and MATCH(). (elastic#33828)
  Emphasize that filesystem-level backups don't work (elastic#33102)
  Use the global doc id to generate a random score (elastic#33599)
  Add minimal sanity checks to custom/scripted similarities. (elastic#33564)
  Profiler: Don’t profile NEXTDOC for ConstantScoreQuery. (elastic#33196)
  [CCR] Change FollowIndexAction.Request class to be more user friendly (elastic#33810)
  SQL: day and month name functions tests locale providers enforcement (elastic#33653)
  TESTS: Set SO_LINGER = 0 for MockNioTransport (elastic#32560)
  Test: Relax jarhell gradle test (elastic#33787)
  [CCR] Fail with a descriptive error if leader index does not exist (elastic#33797)
  Add ES version 6.4.2 (elastic#33831)
  MINOR: Remove Some Dead Code in Scripting (elastic#33800)
  Ensure realtime `_get` and `_termvectors` don't run on the network thread (elastic#33814)
  ...
@colings86 colings86 removed the v7.0.0 label Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Network Http and internode communication implementations >test-failure Triaged test failures from CI v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants