Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] FileSettingsServiceTests class failing #115725

Closed
elasticsearchmachine opened this issue Oct 27, 2024 · 7 comments
Closed

[CI] FileSettingsServiceTests class failing #115725

elasticsearchmachine opened this issue Oct 27, 2024 · 7 comments
Assignees
Labels
:Core/Infra/Settings Settings infrastructure and APIs low-risk An open issue or test failure that is a low risk to future releases Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

Build Scans:

Reproduction Line:

undefined

Applicable branches:
8.16

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

undefined

Issue Reasons:

  • [8.16] 2 consecutive failures in class org.elasticsearch.reservedstate.service.FileSettingsServiceTests
  • [8.16] 2 consecutive failures in step rhel-8_platform-support-unix
  • [8.16] 3 consecutive failures in step windows-2016_checkpart1_platform-support-windows
  • [8.16] 2 consecutive failures in step ubuntu-1804_platform-support-unix
  • [8.16] 3 consecutive failures in step oraclelinux-7_platform-support-unix
  • [8.16] 2 consecutive failures in step sles-15_platform-support-unix
  • [8.16] 67 failures in class org.elasticsearch.reservedstate.service.FileSettingsServiceTests (10.3% fail rate in 650 executions)
  • [8.16] 2 failures in step encryption-at-rest (11.8% fail rate in 17 executions)
  • [8.16] 3 failures in step rhel-8_platform-support-unix (17.6% fail rate in 17 executions)
  • [8.16] 2 failures in step almalinux-8_platform-support-unix (12.5% fail rate in 16 executions)
  • [8.16] 2 failures in step ubuntu-2204_platform-support-unix (12.5% fail rate in 16 executions)
  • [8.16] 4 failures in step oraclelinux-8_platform-support-unix (25.0% fail rate in 16 executions)
  • [8.16] 2 failures in step amazonlinux-2_platform-support-aws (12.5% fail rate in 16 executions)
  • [8.16] 5 failures in step windows-2022_checkpart1_platform-support-windows (31.3% fail rate in 16 executions)
  • [8.16] 7 failures in step windows-2019_checkpart1_platform-support-windows (43.8% fail rate in 16 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Core/Infra/Settings Settings infrastructure and APIs >test-failure Triaged test failures from CI Team:Core/Infra Meta label for core/infra team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Oct 27, 2024
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-core-infra (Team:Core/Infra)

@rjernst
Copy link
Member

rjernst commented Oct 28, 2024

@n1v0lg This seems to be additional fallout from #114295, can you please take a look?

@rjernst rjernst added medium-risk An open issue or test failure that is a medium risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Oct 28, 2024
@n1v0lg
Copy link
Contributor

n1v0lg commented Oct 28, 2024

@rjernst yup! I have fixes in progress: #115770 I'll see if that failure falls under the same umbrella or not

@n1v0lg
Copy link
Contributor

n1v0lg commented Oct 28, 2024

Failures:

FileSettingsServiceTests > classMethod FAILED
    java.io.IOException: Could not remove the following files (in the order of attempts):
       /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001/tempDir-014/config/operator/settings.json: java.io.IOException: access denied: /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001/tempDir-014/config/operator/settings.json
       /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001/tempDir-014/config/operator: java.nio.file.DirectoryNotEmptyException: /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001/tempDir-014/config/operator
       /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001/tempDir-014/config: java.nio.file.DirectoryNotEmptyException: /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001/tempDir-014/config
       /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001/tempDir-014: java.nio.file.DirectoryNotEmptyException: /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001/tempDir-014
       /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001: java.nio.file.DirectoryNotEmptyException: /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001
        at __randomizedtesting.SeedInfo.seed([E448585AA9812990]:0)
        at org.apache.lucene.util.IOUtils.rm(IOUtils.java:341)
        at org.apache.lucene.tests.util.TestRuleTemporaryFilesCleanup.afterAlways(TestRuleTemporaryFilesCleanup.java:209)
        at com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.afterAlways(TestRuleAdapter.java:31)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:43)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
        at java.base/java.lang.Thread.run(Thread.java:1575)
./gradlew ":server:test" --tests "org.elasticsearch.reservedstate.service.FileSettingsServiceTests.testProcessFileChanges" -Dtests.seed=C1483F985AC3BF71 -Dtests.locale=de-IT -Dtests.timezone=Europe/Ljubljana -Druntime.java=23
  2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([C1483F985AC3BF71:E9C0E85733A8B791]:0)
        at org.junit.Assert.fail(Assert.java:87)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.junit.Assert.assertTrue(Assert.java:53)
        at org.elasticsearch.reservedstate.service.FileSettingsServiceTests.testProcessFileChanges(FileSettingsServiceTests.java:256)
REPRODUCE WITH: gradlew ":server:test" --tests "org.elasticsearch.reservedstate.service.FileSettingsServiceTests.testProcessFileChanges" -Dtests.seed=97BF119B7210EEE9 -Dtests.locale=dz -Dtests.timezone=Asia/Kamchatka -Druntime.java=23

FileSettingsServiceTests > testProcessFileChanges FAILED
    Argument(s) are different! Wanted:
    reservedClusterStateService.process(
        <any>,
        <any org.elasticsearch.xcontent.XContentParser>,
        HIGHER_OR_SAME_VERSION,
        <any>
    );
    -> at org.elasticsearch.reservedstate.service.FileSettingsServiceTests.testProcessFileChanges(FileSettingsServiceTests.java:259)
    Actual invocations have different arguments at position [2]:
    reservedClusterStateService.process(
        "file_settings",
        org.elasticsearch.xcontent.provider.json.JsonXContentParser@6911265,
        HIGHER_VERSION_ONLY,
        org.elasticsearch.reservedstate.service.FileSettingsService$$Lambda/0x0000000020775420@719235a5
    );
    -> at org.elasticsearch.reservedstate.service.FileSettingsService.processFileChanges(FileSettingsService.java:141)
        at __randomizedtesting.SeedInfo.seed([97BF119B7210EEE9:BF37C6541B7BE609]:0)
        at app//org.elasticsearch.reservedstate.service.FileSettingsServiceTests.testProcessFileChanges(FileSettingsServiceTests.java:259)

These are all still related to how files are set up during tests, so labeling this low-risk.

The PR I have open should address them (waiting on CI and still thinking through the details here).

@n1v0lg n1v0lg added low-risk An open issue or test failure that is a low risk to future releases and removed medium-risk An open issue or test failure that is a medium risk to future releases labels Oct 28, 2024
@n1v0lg
Copy link
Contributor

n1v0lg commented Oct 28, 2024

(Unless we want to keep at medium just to signify that the entire suite is not giving us good coverage right now -- previously it was only a single method)

@n1v0lg
Copy link
Contributor

n1v0lg commented Oct 28, 2024

Right so the:

/mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001/tempDir-014/config/operator/settings.json: java.io.IOException: access denied: /mnt/secret/elasticsearch-periodic/server/build/testrun/test/temp/org.elasticsearch.reservedstate.service.FileSettingsServiceTests_E448585AA9812990-001/tempDir-014/config/operator/settings.json

bit, in particular the java.io.IOException: access denied: is thrown in Lucene's mock WindowsFS when there's still a file handle open to the file we're trying to delete. I assume that must be the FileSettingsService. What's confusing is that we stop and close it, before we proceed with further teardown. Still figuring on what's going on with the test set up here.

elasticsearchmachine pushed a commit that referenced this issue Oct 29, 2024
This PR addresses some of the failure causes tracked under
#115280 and
#115725: the latch-await
setup was rather convoluted and the move command not always correctly
invoked in the right order. This PR cleans up latching by separating
awaiting the first processing call (on start) from waiting on the
subsequent call. Also, it makes writing the file more robust w.r.t.
OS'es where `atomic_move` may not be available.

This should address failures around the timeout await, and the assertion
failures around invoked methods tracked here:

#115725 (comment)

But will likely require another round of changes to address the failures
to delete files.

Relates: #115280 
Relates: #115725
jfreden pushed a commit to jfreden/elasticsearch that referenced this issue Nov 4, 2024
This PR addresses some of the failure causes tracked under
elastic#115280 and
elastic#115725: the latch-await
setup was rather convoluted and the move command not always correctly
invoked in the right order. This PR cleans up latching by separating
awaiting the first processing call (on start) from waiting on the
subsequent call. Also, it makes writing the file more robust w.r.t.
OS'es where `atomic_move` may not be available.

This should address failures around the timeout await, and the assertion
failures around invoked methods tracked here:

elastic#115725 (comment)

But will likely require another round of changes to address the failures
to delete files.

Relates: elastic#115280 
Relates: elastic#115725
@n1v0lg
Copy link
Contributor

n1v0lg commented Nov 18, 2024

Closing with: #116309

@n1v0lg n1v0lg closed this as completed Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Settings Settings infrastructure and APIs low-risk An open issue or test failure that is a low risk to future releases Team:Core/Infra Meta label for core/infra team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants