Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] LuceneChangesSnapshot to use accurate ops history #2452

Merged
merged 4 commits into from
Mar 15, 2022

Conversation

nknize
Copy link
Collaborator

@nknize nknize commented Mar 14, 2022

Improves the LuceneChangesSnapshot to get an accurate count of recovery
operations using sort by sequence number optimization. This is needed
for Lucene 9 where searchAfter pagination is not thread safe (leading to
inaccurate recovery count exceptions).

@nknize nknize added enhancement Enhancement or improvement to existing feature or request Severity-Blocker v2.0.0 Version 2.0.0 Storage:Durability Issues and PRs related to the durability framework labels Mar 14, 2022
@nknize nknize requested a review from a team as a code owner March 14, 2022 04:17
@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@nknize nknize force-pushed the accurateTranslogHistoryCount branch from 9d73527 to 1c67d0b Compare March 14, 2022 04:19
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 9d735277665f78051f0b8e6162760b08badc82ab
Log 3326

Reports 3326

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 1c67d0b
Log 3327

Reports 3327

@nknize
Copy link
Collaborator Author

nknize commented Mar 14, 2022

Another #2440 failure; refiring

@nknize
Copy link
Collaborator Author

nknize commented Mar 14, 2022

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 1c67d0b
Log 3339

Reports 3339

@nknize nknize requested a review from mch2 March 15, 2022 04:28
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 8f3b2a0d9e498ecb01c1d256dd7d36c8451fe414
Log 3384

Reports 3384

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c3cff3232efce3fd49d6c1b106b58024ab4a2250
Log 3385

Reports 3385

@nknize
Copy link
Collaborator Author

nknize commented Mar 15, 2022

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c3cff3232efce3fd49d6c1b106b58024ab4a2250
Log 3389

Reports 3389

@tlfeng
Copy link
Collaborator

tlfeng commented Mar 15, 2022

In log 3385:

> Task :qa:mixed-cluster:v1.2.5#mixedClusterTest

REPRODUCE WITH: ./gradlew ':qa:mixed-cluster:v1.2.5#mixedClusterTest' --tests "org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT" -Dtests.method="test {p0=indices.get_field_mapping/20_missing_field/Return empty object if field doesn't exist, but index does}" -Dtests.seed=E59ECEB65195B1EF -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=zh-Hans-CN -Dtests.timezone=GMT0 -Druntime.java=17

org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT > test {p0=indices.get_field_mapping/20_missing_field/Return empty object if field doesn't exist, but index does} FAILED
    java.lang.AssertionError: Failure at [indices.get_field_mapping/20_missing_field:20]: field [test_index.mappings] is null
        at __randomizedtesting.SeedInfo.seed([E59ECEB65195B1EF:6DCAF16CFF69DC17]:0)
        at org.opensearch.test.rest.yaml.OpenSearchClientYamlSuiteTestCase.executeSection(OpenSearchClientYamlSuiteTestCase.java:442)
        at org.opensearch.test.rest.yaml.OpenSearchClientYamlSuiteTestCase.test(OpenSearchClientYamlSuiteTestCase.java:415)

It's tracked in issue #2440

In log 3389:

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.gateway.RecoveryFromGatewayIT.testReuseInFileBasedPeerRecovery" -Dtests.seed=7CC9C2C2538A3D21 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ar-AE -Dtests.timezone=SystemV/CST6 -Druntime.java=17

org.opensearch.gateway.RecoveryFromGatewayIT > testReuseInFileBasedPeerRecovery FAILED
    java.lang.AssertionError: shard [test][0] on node [node_t1] has pending operations:
     --> RetentionLeaseBackgroundSyncAction.Request{retentionLeases=RetentionLeases{primaryTerm=1, version=1834, leases={peer_recovery/QRg_-RLGQDWfgQPGUpDVEg=RetentionLease{id='peer_recovery/QRg_-RLGQDWfgQPGUpDVEg', retainingSequenceNumber=950, timestamp=1647325580911, source='peer recovery'}, peer_recovery/3AKiBGC7RiW8BbQammw7fw=RetentionLease{id='peer_recovery/3AKiBGC7RiW8BbQammw7fw', retainingSequenceNumber=950, timestamp=1647325580911, source='peer recovery'}}}, shardId=[test][0], timeout=1m, index='test', waitForActiveShards=0}
    	at org.opensearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:248)
    	at org.opensearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:3146)
    	at org.opensearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:1116)
    	at org.opensearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:433)

It's tracked in issue #1746
Re-run: start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c3cff3232efce3fd49d6c1b106b58024ab4a2250
Log 3392

Reports 3392

@tlfeng
Copy link
Collaborator

tlfeng commented Mar 15, 2022

Start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c3cff3232efce3fd49d6c1b106b58024ab4a2250
Log 3393

Reports 3393

@tlfeng
Copy link
Collaborator

tlfeng commented Mar 15, 2022

Start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c3cff3232efce3fd49d6c1b106b58024ab4a2250
Log 3394

Reports 3394

@nknize
Copy link
Collaborator Author

nknize commented Mar 15, 2022

Another #2440

Tests with failures:
 - org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT.test {p0=indices.get_field_mapping/20_missing_field/Return empty object if field doesn't exist, but index does}

We should mute this until we have the fix.

@nknize
Copy link
Collaborator Author

nknize commented Mar 15, 2022

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c3cff3232efce3fd49d6c1b106b58024ab4a2250
Log 3395

Reports 3395

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c3cff3232efce3fd49d6c1b106b58024ab4a2250
Log 3396

Reports 3396

@nknize
Copy link
Collaborator Author

nknize commented Mar 15, 2022

Not sure why there were two gradle checks running.

For now I'm muting indices.get_field_mapping/20_missing_field.yml w/ AwaitsFix until this can get resolved. It looks to me like it's related to the partial types removal state we're currently in and will resolve when types are completely removed.

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure d0193102c9918ec4a1cf6cb2574b984960d517ed
Log 3398

Reports 3398

@nknize
Copy link
Collaborator Author

nknize commented Mar 15, 2022

unrelated S3 failure! Refiring again!

> Task :test:fixtures:s3-fixture:composeDown FAILED

@nknize
Copy link
Collaborator Author

nknize commented Mar 15, 2022

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure d0193102c9918ec4a1cf6cb2574b984960d517ed
Log 3402

Reports 3402

@nknize
Copy link
Collaborator Author

nknize commented Mar 15, 2022

Another unrelated flake already reported in #1561;

REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.cluster.allocation.ClusterRerouteIT.testDelayWithALargeAmountOfShards" -Dtests.seed=AE38C3D90A3FA8F2 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=he-IL -Dtests.timezone=Etc/GMT+1 -Druntime.java=17

Refiring

@nknize
Copy link
Collaborator Author

nknize commented Mar 15, 2022

start gradle check

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure d0193102c9918ec4a1cf6cb2574b984960d517ed
Log 3403

Reports 3403

@nknize nknize force-pushed the accurateTranslogHistoryCount branch from d019310 to 505e1ea Compare March 15, 2022 15:11
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 505e1ea3281bf6c8f0cedfb8c406820f6acc4114
Log 3406

Reports 3406

@nknize nknize force-pushed the accurateTranslogHistoryCount branch from 505e1ea to 321ab12 Compare March 15, 2022 15:49
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 321ab1298d700ad135afc257a547d11311dd41d5
Log 3408

Reports 3408

Improves the LuceneChangesSnapshot to get an accurate count of recovery
operations using sort by sequence number optimization.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
@nknize nknize force-pushed the accurateTranslogHistoryCount branch from 321ab12 to 6a15157 Compare March 15, 2022 16:40
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success 6a15157
Log 3412

Reports 3412

@nknize nknize merged commit 757abdb into opensearch-project:main Mar 15, 2022
@nknize nknize mentioned this pull request Mar 16, 2022
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Severity-Blocker Storage:Durability Issues and PRs related to the durability framework v2.0.0 Version 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants