Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Update force segment replication round to be synchronous #5898

Merged
merged 8 commits into from
Jan 27, 2023

Conversation

dreamer-89
Copy link
Member

@dreamer-89 dreamer-89 commented Jan 17, 2023

Description

This change fixes the primary relocation path with segrep. This changes:

  • Segment replication to happen synchronously. This has side effects of writes happening on older primary and segment replication to timeout due to parallel writes on older primary and round of segment replication from older primary to target.
  • Remove StepListener listener post above change. This needed corresponding changes in function definitions & unit test changes.
  • Updates existing integration test to reduce flakyness
  • Verified newly added tests are not flaky

Issues Resolved

#5848

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dreamer-89
Copy link
Member Author

Converted to draft as it needs more unit test changes and a stablized integration test which verifies delay operations behavior.

@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@dreamer-89
Copy link
Member Author

dreamer-89 commented Jan 19, 2023

Tested the change on a 3 data-node 1 master cluster for below use-cases where I didn't find any issue.

  1. Large primary shard relocation. With no activity happening during the relocation, the round of segment replication completes in few ms. Steps gist link.
  2. Concurrent ingestion during relocation. Noticed that not all document show up post relocation even though all documents were Ack'ed. Steps gist link. This does not happen with doc rep enabled indices. Cut a separate issue as I found this happening in other scenarios as well. [Segment Replication] [BUG] Missing documents post ingestion during primary failover and relocation #5946

@dreamer-89 dreamer-89 marked this pull request as ready for review January 19, 2023 02:19
@github-actions

This comment was marked as outdated.

@dreamer-89
Copy link
Member Author

@ashking94 @Bukhtawar @mch2 : Gentle reminder on review.

@dreamer-89
Copy link
Member Author

@ashking94 @Bukhtawar @mch2 : Gentle reminder on review.

@ashking94 @Bukhtawar : Ping for review ^

Copy link
Member

@mch2 mch2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for fixing this!

Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>

    pick da8cb72ab4f Update unit test post rebase
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
@github-actions

This comment was marked as outdated.

Signed-off-by: Suraj Singh <surajrider@gmail.com>
Signed-off-by: Suraj Singh <surajrider@gmail.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@dreamer-89 dreamer-89 merged commit ebb5813 into opensearch-project:main Jan 27, 2023
@dreamer-89 dreamer-89 added the backport 2.x Backport to 2.x branch label Jan 27, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-5898-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ebb5813273313d8ebbc4df8576c713651212d3ed
# Push it to GitHub
git push --set-upstream origin backport/backport-5898-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-5898-to-2.x.

@dreamer-89 dreamer-89 mentioned this pull request Jan 27, 2023
6 tasks
dreamer-89 added a commit to dreamer-89/OpenSearch that referenced this pull request Jan 27, 2023
…nchronous (opensearch-project#5898)

* Update force segment replication sync to be synchronous

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add logs and fix spotlessApply

Signed-off-by: Suraj Singh <surajrider@gmail.com>

    pick da8cb72ab4f Update unit test post rebase

* Update unit test post rebase

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Update integration tests

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Mute testPrimaryRelocationWithSegRepFailure

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Remove extra closing bracket after main merge

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* PR feedback

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Spotless fix

Signed-off-by: Suraj Singh <surajrider@gmail.com>

---------

Signed-off-by: Suraj Singh <surajrider@gmail.com>
mch2 pushed a commit to mch2/OpenSearch that referenced this pull request Mar 4, 2023
…nchronous (opensearch-project#5898)

* Update force segment replication sync to be synchronous

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Add logs and fix spotlessApply

Signed-off-by: Suraj Singh <surajrider@gmail.com>

    pick da8cb72ab4f Update unit test post rebase

* Update unit test post rebase

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Update integration tests

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Mute testPrimaryRelocationWithSegRepFailure

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Remove extra closing bracket after main merge

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* PR feedback

Signed-off-by: Suraj Singh <surajrider@gmail.com>

* Spotless fix

Signed-off-by: Suraj Singh <surajrider@gmail.com>

---------

Signed-off-by: Suraj Singh <surajrider@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants