Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] SamlServiceProviderMetadataIT classMethod failing #109452

Closed
mark-vieira opened this issue Jun 6, 2024 · 3 comments · Fixed by #109558
Closed

[CI] SamlServiceProviderMetadataIT classMethod failing #109452

mark-vieira opened this issue Jun 6, 2024 · 3 comments · Fixed by #109558
Assignees
Labels
low-risk An open issue or test failure that is a low risk to future releases :Security/Security Security issues without another label Team:Security Meta label for security team >test-failure Triaged test failures from CI

Comments

@mark-vieira
Copy link
Contributor

This is only failing on Java 23. The failure is due to a timeout attempting to start the cluster so something in getting hung on Java 23 here.

Build scan:
https://gradle-enterprise.elastic.co/s/w4kjz3jbtvl7s/tests/:x-pack:plugin:security:qa:saml-rest-tests:javaRestTest/org.elasticsearch.xpack.security.authc.saml.SamlServiceProviderMetadataIT

Reproduction line:

null

Applicable branches:
main

Reproduces locally?:
Yes

Failure history:
Failure dashboard for org.elasticsearch.xpack.security.authc.saml.SamlServiceProviderMetadataIT#classMethod

Failure excerpt:

java.lang.RuntimeException: An error occurred orchestrating test cluster.

  at org.elasticsearch.test.cluster.local.DefaultLocalClusterHandle.execute(DefaultLocalClusterHandle.java:269)
  at org.elasticsearch.test.cluster.local.DefaultLocalClusterHandle.writeUnicastHostsFile(DefaultLocalClusterHandle.java:250)
  at org.elasticsearch.test.cluster.local.DefaultLocalClusterHandle.waitUntilReady(DefaultLocalClusterHandle.java:193)
  at org.elasticsearch.test.cluster.local.DefaultLocalClusterHandle.start(DefaultLocalClusterHandle.java:79)
  at org.elasticsearch.test.cluster.local.DefaultLocalElasticsearchCluster$1.evaluate(DefaultLocalElasticsearchCluster.java:45)
  at org.elasticsearch.test.junit.RunnableTestRuleAdapter$1.evaluate(RunnableTestRuleAdapter.java:43)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1575)

  Caused by: java.lang.RuntimeException: Timed out after PT3M waiting for ports files for: { cluster: 'test-cluster', node: 'test-cluster-0' }

    at org.elasticsearch.test.cluster.local.AbstractLocalClusterFactory$Node.waitUntilReady(AbstractLocalClusterFactory.java:285)
    at org.elasticsearch.test.cluster.local.AbstractLocalClusterFactory$Node.getTransportEndpoint(AbstractLocalClusterFactory.java:204)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:215)
    at java.util.AbstractList$RandomAccessSpliterator.forEachRemaining(AbstractList.java:722)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560)
    at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:960)
    at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:934)
    at java.util.stream.AbstractTask.compute(AbstractTask.java:327)
    at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:759)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:507)
    at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:676)
    at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:927)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:264)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:727)
    at org.elasticsearch.test.cluster.local.DefaultLocalClusterHandle.lambda$writeUnicastHostsFile$13(DefaultLocalClusterHandle.java:250)
    at java.util.concurrent.ForkJoinTask$AdaptedInterruptibleCallable.compute(ForkJoinTask.java:1689)
    at java.util.concurrent.ForkJoinTask$InterruptibleTask.exec(ForkJoinTask.java:1641)
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:507)
    at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1489)
    at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:2071)
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2033)
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:189)

@mark-vieira mark-vieira added :Core/Infra/Core Core issues without another label :Security/Security Security issues without another label >test-failure Triaged test failures from CI labels Jun 6, 2024
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team Team:Security Meta label for security team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Jun 6, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@tvernum
Copy link
Contributor

tvernum commented Jun 11, 2024

There's 2 issues here:

  1. We don't configure any socket timeouts in the HTTP Resolver, and it blocks the main startup thread. The blocking is somewhat intentional (though not necessarily a good idea) because we provide the option to fail the node if the metadata can't be resolved
  2. For some reason on JDK23 the MockHttpServer is responding to the first request, but the 2nd request times out. @jakelandis ran into some other behaviour changes with the mock http server in JDK23 ([CI] SSLConfigurationReloaderTests testReloadingKeyStore failing #108774 (comment)), so I guess we need to be on the lookout for more of them.

I haven't fixed it yet, but it's clear the thing that broke is the mock HTTP server, so I'm setting the risk to low.

@tvernum tvernum added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team labels Jun 11, 2024
tvernum added a commit to tvernum/elasticsearch that referenced this issue Jun 11, 2024
In JDK23 the `HttpServer` requires that http response be explicitly
closed (even if there is no response body)

Resolves: elastic#109452
tvernum added a commit that referenced this issue Jun 12, 2024
In JDK23 the `HttpServer` requires that http response be explicitly
closed (even if there is no response body)

Resolves: #109452
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low-risk An open issue or test failure that is a low risk to future releases :Security/Security Security issues without another label Team:Security Meta label for security team >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants