Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make index and global metadata upload timeout dynamic cluster settings #10814

Merged

Conversation

rahulkarajgikar
Copy link
Contributor

@rahulkarajgikar rahulkarajgikar commented Oct 21, 2023

Description

  • Make remote store index and global metadata upload timeout dynamic
  • Adds 2 new dynamically configurable settings, with default value 20s:
    • cluster.remote_store.state.index_metadata.upload_timeout
    • cluster.remote_store.state.global_metadata.upload_timeout

Related Issues

Resolves #10688

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Rahul Karajgikar added 2 commits October 21, 2023 17:16
Signed-off-by: Rahul Karajgikar <karajgik@amazon.com>
Signed-off-by: Rahul Karajgikar <karajgik@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 21, 2023

Compatibility status:

Checks if related components are compatible with change f08c371

Incompatible components

Incompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

@rahulkarajgikar
Copy link
Contributor Author

rahulkarajgikar commented Oct 22, 2023

retrying tests locally:

./gradlew ':server:test' --tests "org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore" -Dtests.iters=10

test failing every time with same error:

java.lang.AssertionError: timed out waiting for yellow state
        at __randomizedtesting.SeedInfo.seed([8C145360427AE77D:1341F63BA44A0E7D]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.opensearch.test.OpenSearchIntegTestCase.ensureColor(OpenSearchIntegTestCase.java:1012)
        at org.opensearch.test.OpenSearchIntegTestCase.ensureYellowAndNoInitializingShards(OpenSearchIntegTestCase.java:965)
        at org.opensearch.remotestore.BaseRemoteStoreRestoreIT.verifyRestoredData(BaseRemoteStoreRestoreIT.java:65)
        at org.opensearch.remotestore.BaseRemoteStoreRestoreIT.verifyRestoredData(BaseRemoteStoreRestoreIT.java:89)
        at org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore(RemoteStoreClusterStateRestoreIT.java:162)

Failed all 10 times with these seeds:

 - org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore {seed=[8C145360427AE77D:82616C2453BEFFA8]}
 - org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore {seed=[8C145360427AE77D:3637D0D8677C3484]}
 - org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore {seed=[8C145360427AE77D:B8DE460436B87C4F]}
 - org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore {seed=[8C145360427AE77D:8930EDE15A462766]}
 - org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore {seed=[8C145360427AE77D:C5F1684CFB4EE7DD]}
 - org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore {seed=[8C145360427AE77D:540BBB1386F2AADD]}
 - org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore {seed=[8C145360427AE77D:6AD5DF9594C2BADB]}
 - org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore {seed=[8C145360427AE77D:F66645EFB7D62E75]}
 - org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore {seed=[8C145360427AE77D:C4CAA081C01D392F]}
 - org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore {seed=[8C145360427AE77D:1341F63BA44A0E7D]}

Signed-off-by: Rahul Karajgikar <karajgik@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@rahulkarajgikar
Copy link
Contributor Author

rahulkarajgikar commented Oct 23, 2023

ongoing PR in review to fix this failing test in main:

https://github.com/opensearch-project/OpenSearch/pull/10838/files

org.opensearch.remotestore.RemoteStoreClusterStateRestoreIT.testFullClusterStateRestore

@rahulkarajgikar
Copy link
Contributor Author

#10838 - fix for failing test is here

@rahulkarajgikar rahulkarajgikar force-pushed the test-remote-dynamic-2 branch 2 times, most recently from 63fe86c to cb68ea8 Compare October 23, 2023 06:00
Signed-off-by: Rahul Karajgikar <karajgik@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled
      1 org.opensearch.cluster.allocation.ClusterRerouteIT.testDelayWithALargeAmountOfShards

@rahulkarajgikar
Copy link
Contributor Author

rahulkarajgikar commented Oct 23, 2023

test failures are flaky:

      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled
      1 org.opensearch.cluster.allocation.ClusterRerouteIT.testDelayWithALargeAmountOfShards

Flaky test tracking issues: #10755 #10558

@rahulkarajgikar rahulkarajgikar changed the title Make index and global metadata upload timeout dynamic Make index and global metadata upload timeout dynamic cluster settings Oct 23, 2023
Signed-off-by: Rahul Karajgikar <karajgik@amazon.com>
@shwetathareja
Copy link
Member

Documentation needed for the newly added settings

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT.test {yaml=pit/10_basic/Delete all}

@shwetathareja shwetathareja merged commit 8f13dee into opensearch-project:main Oct 23, 2023
14 checks passed
@shwetathareja shwetathareja added the backport 2.x Backport to 2.x branch label Oct 23, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-10814-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 8f13dee77a7e78833cf90b20607cb4d714032bd8
# Push it to GitHub
git push --set-upstream origin backport/backport-10814-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-10814-to-2.x.

rahulkarajgikar added a commit to rahulkarajgikar/OpenSearch that referenced this pull request Oct 23, 2023
opensearch-project#10814)

* Make index and global metadata upload wait time dynamic

Signed-off-by: Rahul Karajgikar <karajgik@amazon.com>
(cherry picked from commit 8f13dee)
shwetathareja pushed a commit that referenced this pull request Oct 23, 2023
#10814) (#10852)

* Make index and global metadata upload wait time dynamic

Signed-off-by: Rahul Karajgikar <karajgik@amazon.com>
(cherry picked from commit 8f13dee)
@rahulkarajgikar rahulkarajgikar deleted the test-remote-dynamic-2 branch April 18, 2024 06:06
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
opensearch-project#10814)

* Make index and global metadata upload wait time dynamic

Signed-off-by: Rahul Karajgikar <karajgik@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Remote cluster state] Make timeout as dynamic settings.
4 participants