Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster primary balance contraint for rebalancing with buffer #12656

Merged
merged 22 commits into from
Apr 2, 2024

Conversation

Arpit-Bandejiya
Copy link
Contributor

@Arpit-Bandejiya Arpit-Bandejiya commented Mar 14, 2024

Description

Currently, the cluster.routing.allocation.balance.prefer_primary is used for balancing the primary during allocation. This change introduces primary balancing during rebalancing phase using a new setting cluster.routing.allocation.rebalance.prefer_primary. Also we introduce the concept of buffer to relax the constraints to control the degree of balance we need in the rebalancing phase.

We also introduced the cluster.routing.allocation.balance.prefer_random_allocation setting to randomly allocate the nodes instead of going in round robin fashion when multiple nodes have MIN_WEIGHT. On doing extended testing(resulted attached below), we saw we were not seeing gains with random allocation hence not going again with the change.

Related Issues

Resolves #12250

Benchmarking of the changes:

We have used the AllocationBenchmark to perform the benchmarking of the change. We have altered the testcases to be the following.

Benchmark                                                      (indicesShardsReplicasSourceTargetRecoveries)  Mode  Cnt        Score       Error  Units
AllocationBenchmark.measureShardRelocationComplete           100|     10|        1|      40|     20|      8|  avgt   10       24.583 ±     2.950  ms/op
AllocationBenchmark.measureShardRelocationComplete           100|      3|        1|      50|     70|      3|  avgt   10       57.546 ±     9.317  ms/op
AllocationBenchmark.measureShardRelocationComplete           100|      3|        2|      50|     30|      6|  avgt   10       38.094 ±     1.137  ms/op
AllocationBenchmark.measureShardRelocationComplete           100|     10|        2|      33|     55|      6|  avgt   10       54.622 ±     3.190  ms/op
AllocationBenchmark.measureShardRelocationComplete            50|      60|       1|     100|    100|      6|  avgt   10      576.792 ±   100.885  ms/op
AllocationBenchmark.measureShardRelocationComplete            50|      60|       1|     100|     40|      6|  avgt   10      132.869 ±    22.630  ms/op
AllocationBenchmark.measureShardRelocationComplete            500|     60|       1|     100|    100|     12|  avgt   10     1573.607 ±   143.696  ms/op
AllocationBenchmark.measureShardRelocationComplete            500|     60|       1|     100|     40|     12|  avgt   10      622.524 ±    99.199  ms/op
AllocationBenchmark.measureShardRelocationComplete         1000|     50|       1|     1000|    1000|     12|  avgt   10  1161775.841 ± 24716.809  ms/op
AllocationBenchmark.measureShardRelocationComplete         1000|     50|       1|      700|    1000|     12|  avgt   10   567490.762 ± 18112.951  ms/op

Results

For original algorithm:

Result "org.opensearch.benchmark.routing.allocation.AllocationBenchmark.measureShardRelocationComplete":
  567490.762 ±(99.9%) 18112.951 ms/op [Average]
  (min, avg, max) = (546532.222, 567490.762, 583717.762), stdev = 11980.595
  CI (99.9%): [549377.811, 585603.712] (assumes normal distribution)

# Run complete. Total time: 4 days, 20:00:22

We initially compared it with rebalancing of primary shards with 5% buffer allowed.

Result "org.opensearch.benchmark.routing.allocation.AllocationBenchmark.measureShardRelocationComplete":
  638574.884 ±(99.9%) 19032.310 ms/op [Average]
  (min, avg, max) = (621396.606, 638574.884, 657225.254), stdev = 12588.695
  CI (99.9%): [619542.574, 657607.193] (assumes normal distribution)

# Run complete. Total time: 4 days, 22:07:35

We ran the benchmark with random allocation of MIN_WEIGHT nodes to see if we are getting any gains, we found it isn't helping much and the avg, max scores were comparitively high than the normal allocation. Therefore, we decided to not go ahead with it.

Result "org.opensearch.benchmark.routing.allocation.AllocationBenchmark.measureShardRelocationComplete":
  641953.191 ±(99.9%) 22721.251 ms/op [Average]
  (min, avg, max) = (617694.209, 641953.191, 663163.623), stdev = 15028.700
  CI (99.9%): [619231.941, 664674.442] (assumes normal distribution)

# Run complete. Total time: 4 days, 21:05:20

We then also performed benchmarking with different buffer percent.

For 10% buffer:

Result "org.opensearch.benchmark.routing.allocation.AllocationBenchmark.measureShardRelocationComplete":
  635666.691 ±(99.9%) 29678.806 ms/op [Average]
  (min, avg, max) = (603124.558, 635666.691, 669456.406), stdev = 19630.692
  CI (99.9%): [605987.886, 665345.497] (assumes normal distribution)

# Run complete. Total time: 4 days, 20:17:26

For 1% buffer:

Result "org.opensearch.benchmark.routing.allocation.AllocationBenchmark.measureShardRelocationComplete":
  651670.118 ±(99.9%) 25397.328 ms/op [Average]
  (min, avg, max) = (627210.507, 651670.118, 667302.240), stdev = 16798.760
  CI (99.9%): [626272.790, 677067.446] (assumes normal distribution)

# Run complete. Total time: 4 days, 21:40:19

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

❌ Gradle check result for 8aed71b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Mar 14, 2024

Compatibility status:

Checks if related components are compatible with change 9f94ba5

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

Copy link
Contributor

❌ Gradle check result for a2eaddd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

❌ Gradle check result for 761adc3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

❌ Gradle check result for f842b41: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

❌ Gradle check result for 938ac4d:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request ShardManagement:Placement labels Mar 18, 2024
Copy link
Contributor

❌ Gradle check result for ece3b26: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

❌ Gradle check result for 80dbcb9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

❌ Gradle check result for 3d4d865: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

❌ Gradle check result for d7657af: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Contributor

❌ Gradle check result for 566c7ef: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@Arpit-Bandejiya
Copy link
Contributor Author

Failed test:

Tests with failures:
 - org.opensearch.upgrades.RefreshVersionInClusterStateIT.testRefresh

@Arpit-Bandejiya
Copy link
Contributor Author

Arpit-Bandejiya commented Mar 21, 2024

#11933 --> The above test is flaky

@Arpit-Bandejiya Arpit-Bandejiya marked this pull request as ready for review March 21, 2024 13:41
Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have left some minor comments, lgtm otherwise.

Copy link
Contributor

✅ Gradle check result for b28f5e8: SUCCESS

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need AllocationBenchmark class changes for JMH benchmarks to be checked in?

@Arpit-Bandejiya
Copy link
Contributor Author

Do we need AllocationBenchmark class changes for JMH benchmarks to be checked in?

The current benchmark contains iterations for 200 nodes setup. I have added iterations to benchmark on higher node setup(Upto 1000 nodes). I think we can open an issue to see if we really want to add iterations for higher number of nodes setup for all of the benchmark. Let me know your thoughts on it.

Copy link
Contributor

github-actions bot commented Apr 1, 2024

✅ Gradle check result for e823b85: SUCCESS

Signed-off-by: Arpit-Bandejiya <abandeji@amazon.com>
Copy link
Contributor

github-actions bot commented Apr 2, 2024

✅ Gradle check result for 9f94ba5: SUCCESS

@gbbafna gbbafna merged commit 3491bcb into opensearch-project:main Apr 2, 2024
31 checks passed
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Apr 2, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-12656-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 3491bcb23d6b398117cfd11c5d273b2e83798d0b
# Push it to GitHub
git push --set-upstream origin backport/backport-12656-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-12656-to-2.x.

Arpit-Bandejiya added a commit to Arpit-Bandejiya/OpenSearch that referenced this pull request Apr 2, 2024
…ensearch-project#12656)

Signed-off-by: Arpit-Bandejiya <abandeji@amazon.com>

(cherry picked from commit 3491bcb)
Arpit-Bandejiya added a commit to Arpit-Bandejiya/OpenSearch that referenced this pull request Apr 2, 2024
…ensearch-project#12656)

Signed-off-by: Arpit-Bandejiya <abandeji@amazon.com>

(cherry picked from commit 3491bcb)
Signed-off-by: Arpit Bandejiya <abandeji@amazon.com>
gbbafna pushed a commit that referenced this pull request Apr 2, 2024
…2656) (#13014)

(cherry picked from commit 3491bcb)

Signed-off-by: Arpit-Bandejiya <abandeji@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…ensearch-project#12656)

Signed-off-by: Arpit-Bandejiya <abandeji@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
harshavamsi pushed a commit to harshavamsi/OpenSearch that referenced this pull request Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed enhancement Enhancement or improvement to existing feature or request ShardManagement:Placement
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

[Feature Request] [Segment Replication] Balanced primary count across all nodes during rebalancing
5 participants