Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote reindex: Add support for configurable retry mechanism #12561

Merged
merged 6 commits into from
Mar 13, 2024

Conversation

ankitkala
Copy link
Member

@ankitkala ankitkala commented Mar 8, 2024

Description

Remote Reindex mostly involves repeating 2 major steps:

  1. fetch data using scroll call.
  2. Ingest the data fetched using a transport level bulk request.

I've added following minor enhancements to the remote reindex process.

  • Current retry mechanism retries the entire reindex process from the start. I've changed the behavior of retry logic to scroll requests to remote cluster (Bulk already has a retry mechanism).
  • Current retry mechanism would only kick in for 429s. I've also added retried for 5xx & ConnectException.
  • Exposed the retry params behind cluster settings.
  • Also, changed the logger to prefix based. This is useful to segregate logs from multiple ongoing reindexes for easier debugging.
  • Added debug logs for few places for ease of tracking the reindex operation.

Related Issues

Resolves #12560

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing labels Mar 8, 2024
@ankitkala
Copy link
Member Author

ankitkala commented Mar 8, 2024

I'm still working on adding tests for this change. Wanted to get the changes out for initial feedback while i work on tests.
WIll follow up with a documentation issue as well.

Copy link
Contributor

github-actions bot commented Mar 8, 2024

❌ Gradle check result for 990457c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Mar 8, 2024

Compatibility status:

Checks if related components are compatible with change e954ff3

Incompatible components

Incompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/alerting.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/performance-analyzer.git]

Copy link
Member

@shwetathareja shwetathareja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ankitkala . Few minor comments.

Signed-off-by: Ankit Kala <ankikala@amazon.com>
@ankitkala ankitkala added the backport 2.x Backport to 2.x branch label Mar 11, 2024
Copy link
Contributor

❌ Gradle check result for 2161a4f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for fc71528: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Ankit Kala <ankikala@amazon.com>
Copy link
Contributor

❌ Gradle check result for 8bc6afb: ABORTED

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for fdf7502: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ankitkala
Copy link
Member Author

❌ Gradle check result for fdf7502: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Gradle check failure due to flaky test: #12610

Signed-off-by: Ankit Kala <ankikala@amazon.com>
Copy link
Contributor

❌ Gradle check result for 13a2a47: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Ankit Kala <ankikala@amazon.com>
Copy link
Contributor

❕ Gradle check result for 9e2fc06: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Mar 12, 2024

Codecov Report

Attention: Patch coverage is 94.73684% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 71.54%. Comparing base (b15cb0c) to head (e954ff3).
Report is 16 commits behind head on main.

Files Patch % Lines
...ndex/reindex/remote/RemoteScrollableHitSource.java 92.30% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12561      +/-   ##
============================================
+ Coverage     71.42%   71.54%   +0.11%     
- Complexity    59978    60064      +86     
============================================
  Files          4985     4985              
  Lines        282275   282306      +31     
  Branches      40946    40949       +3     
============================================
+ Hits         201603   201963     +360     
+ Misses        63999    63621     -378     
- Partials      16673    16722      +49     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ankitkala
Copy link
Member Author

❕ Gradle check result for 9e2fc06: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Flaky test: #10755

@ankitkala ankitkala removed the backport 2.x Backport to 2.x branch label Mar 13, 2024
Signed-off-by: Ankit Kala <ankikala@amazon.com>
Copy link
Contributor

❕ Gradle check result for e954ff3: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@ankitkala
Copy link
Member Author

MinimumClusterManagerNodesIT

Flaky test: #10006

@shwetathareja shwetathareja merged commit b07c8fb into opensearch-project:main Mar 13, 2024
30 checks passed
@shwetathareja shwetathareja added the backport 2.x Backport to 2.x branch label Mar 13, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-12561-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 b07c8fb32f79ba69bc6b9ab2e321267b288b3310
# Push it to GitHub
git push --set-upstream origin backport/backport-12561-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-12561-to-2.x.

@ankitkala
Copy link
Member Author

Issue for documentation update: opensearch-project/documentation-website#6674

rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…rch-project#12561)

* Remote reindex: Add support for configurable retry mechanism

Signed-off-by: Ankit Kala <ankikala@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…rch-project#12561)

* Remote reindex: Add support for configurable retry mechanism

Signed-off-by: Ankit Kala <ankikala@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Remote Reindex] Add support for configurable retries in remote reindex process
2 participants