Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] RemoveCorruptedShardDataCommandTests.testCorruptedBothIndexAndTranslog fails after lucene upgrade #52490

Closed
henningandersen opened this issue Feb 18, 2020 · 3 comments
Assignees
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI

Comments

@henningandersen
Copy link
Contributor

The 7.x test fails with:

No text input configured for prompt [Confirm [y/N] ]Close stacktrace
at __randomizedtesting.SeedInfo.seed([430F35E2B82EB29D:FEE5643694F8711E]:0)
at org.elasticsearch.cli.MockTerminal.readText(MockTerminal.java:60)
at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.confirm(RemoveCorruptedShardDataCommand.java:235)
at org.elasticsearch.index.shard.RemoveCorruptedShardDataCommand.dropCorruptMarkerFiles(RemoveCorruptedShardDataCommand.java:208)

This error is caused by lucene not detecting the corruption after latest snapshot upgrade. The error reproduces locally (on 7.x branch) using:

./gradlew ':server:test' --tests "org.elasticsearch.index.shard.RemoveCorruptedShardDataCommandTests.testCorruptedBothIndexAndTranslog"   -Dtests.seed=430F35E2B82EB29D   -Dtests.security.manager=true   -Dtests.locale=mt   -Dtests.timezone=Africa/Lubumbashi   -Dcompiler.java=13

If I do:

git revert --no-commit 80e3c972100468325870319a952a46ad3ad3ed30

the problem disappears since Lucene detects the corruption.

Elasticsearch validates the last commit using Store.checkIntegrity which calls CodecUtil.checksumEntireFile(input). I am not sure if the problem is a bug in Lucene or Elasticsearch being overly ambitious here. Given that we do corrupt the file I tend to think Lucene is at fault, but the comment here could also indicate that this is simply a discrepancy in the checks?

@henningandersen henningandersen added >test-failure Triaged test failures from CI :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Feb 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Engine)

@jpountz
Copy link
Contributor

jpountz commented Feb 24, 2020

This is a Lucene bug. The move of the stored fields index off-heap removed the integrity checks of the index from open-time but didn't add them back to StoredFieldsProducer#checkIntegrity. I opened https://issues.apache.org/jira/browse/LUCENE-9247.

@jpountz
Copy link
Contributor

jpountz commented Mar 5, 2020

Closed via #53150.

@jpountz jpountz closed this as completed Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants