Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Failure in DiscoveryDisruptionIT.testClusterFormingWithASlowNode #33251

Closed
cbuescher opened this issue Aug 29, 2018 · 2 comments
Closed
Assignees
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI

Comments

@cbuescher
Copy link
Member

Build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+matrix-java-periodic/ES_BUILD_JAVA=java10,ES_RUNTIME_JAVA=java8fips,nodes=virtual&&linux/268/console

This doesn't reproduce locally unfortunately:

./gradlew :server:integTest \
  -Dtests.seed=C95E04440D83BEC0 \
  -Dtests.class=org.elasticsearch.discovery.DiscoveryDisruptionIT \
  -Dtests.method="testClusterFormingWithASlowNode" \
  -Dtests.security.manager=true \
  -Dtests.locale=sr-Latn \
  -Dtests.timezone=America/Danmarkshavn \
  -Dcompiler.java=10 \
  -Druntime.java=8FIPS \
  -Djavax.net.ssl.keyStorePassword=password \
  -Djavax.net.ssl.trustStorePassword=password

Errors:

There are lots of NodeNotConnectedExceptions in the log that look like this:

18:20:38   1> [2018-08-29T16:19:38,289][DEBUG][o.e.t.t.MockTransportService] [node_t0] Exception while sending request, handler likely already notified due to timeout
18:20:38   1> org.elasticsearch.transport.NodeNotConnectedException: [node_t1][127.0.0.1:30101] connection already closed
18:20:38   1> 	at org.elasticsearch.transport.TcpTransport$NodeChannels.sendRequest(TcpTransport.java:410) ~[main/:?]
18:20:38   1> 	at org.elasticsearch.test.transport.MockTransportService.lambda$addFailToSendNoConnectRule$3(MockTransportService.java:223) ~[framework-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
18:20:38   1> 	at org.elasticsearch.test.transport.StubbableTransport$WrappedConnection.sendRequest(StubbableTransport.java:209) ~[framework-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]
18:20:38   1> 	at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:658) ~[main/:?]
18:20:38   1> 	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:573) [main/:?]
18:20:38   1> 	at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:561) [main/:?]
18:20:38   1> 	at org.elasticsearch.discovery.zen.MasterFaultDetection$MasterPinger.run(MasterFaultDetection.java:225) [main/:?]
18:20:38   1> 	at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:445) [main/:?]
18:20:38   1> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
18:20:38   1> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
18:20:38   1> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_181]
18:20:38   1> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_181]
18:20:38   1> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
18:20:38   1> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
18:20:38   1> 	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

After that masters leave and apparently cannot be reelected:

18:20:41   1> [2018-08-29T16:20:00,602][WARN ][o.e.t.d.TestZenDiscovery ] [node_t3] not enough master nodes discovered during pinging (found [[Candidate{node={node_t3}{cEU2AwQvT0GLx_oft7MEXQ}{7Ys6d08GQVimIOAHpOfGwQ}{127.0.0.1}

Finally:

ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
	at __randomizedtesting.SeedInfo.seed([C95E04440D83BEC0:256621FFA6166148]:0)
	at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:166)
	at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.checkGlobalBlock(TransportIndicesStatsAction.java:71)
	at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.checkGlobalBlock(TransportIndicesStatsAction.java:48)
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$AsyncAction.<init>(TransportBroadcastByNodeAction.java:248)
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.doExecute(TransportBroadcastByNodeAction.java:226)
	at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction.doExecute(TransportBroadcastByNodeAction.java:78)
	at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:143)
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:119)
	at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:62)
	at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83)
	at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:388)
	at org.elasticsearch.client.FilterClient.doExecute(FilterClient.java:65)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:388)
	at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:377)
	at org.elasticsearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1230)
	at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:45)
	at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:52)
	at org.elasticsearch.test.ESIntegTestCase.lambda$assertSeqNos$7(ESIntegTestCase.java:2331)
	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:836)
	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:822)
	at org.elasticsearch.test.ESIntegTestCase.assertSeqNos(ESIntegTestCase.java:2330)
	at org.elasticsearch.discovery.AbstractDisruptionTestCase.beforeIndexDeletion(AbstractDisruptionTestCase.java:112)
	at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:588)
	at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2186)
	at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:965)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
	at java.lang.Thread.run(Thread.java:748)
@cbuescher cbuescher added >test-failure Triaged test failures from CI :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Aug 29, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@jkakavas
Copy link
Member

Can't reproduce locally either.

java.io.IOException: Invalid keystore format exceptions are unrelated, see #32737 (comment)

Also not sure why

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':modules:transport-netty4:thirdPartyAudit'.
> Invalid exclusions, nothing is wrong with these classes:   * org.bouncycastle.asn1.x500.X500Name

didn't fail the build at that time..

dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Aug 30, 2018
Some AbstractDisruptionTestCase tests start failing since we enabled
assertSeqNos (in elastic#33130). They fail because the assertSeqNos assertion
queries cluster stats while the cluster is disrupted or not formed yet.

This commit switches to use the cluster state and shard stats directly
from the test cluster.

Closes elastic#33251
dnhatn added a commit to dnhatn/elasticsearch that referenced this issue Aug 30, 2018
Some AbstractDisruptionTestCase tests start failing since we enabled
assertSeqNos (in elastic#33130). They fail because the assertSeqNos assertion
queries cluster stats while the cluster is disrupted or not formed yet.

This commit switches to use the cluster state and shard stats directly
from the test cluster.

Closes elastic#33251
dnhatn added a commit that referenced this issue Sep 5, 2018
Some AbstractDisruptionTestCase tests start failing since we enabled
assertSeqNos (in #33130). They fail because the assertSeqNos assertion
queries cluster stats while the cluster is disrupted or not formed yet.

This commit switches to use the cluster state and shard stats directly
from the test cluster.

Closes #33251
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants