Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a new index level setting to limit the total primary shards per node per index #17295

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pandeydivyansh1803
Copy link

Description

For remote store backed cluster, Segment Replication is used as the replication strategy. With segment replication, segments are created only on primary shard and these segments are copied to the replica shards. As segment creation is CPU intensive, we have observed CPU skew between nodes of the same cluster where primary shards are not balanced.

The earlier attempts to rebalance primary shards across nodes (#6422, #12250) are definitely helping to reduce the skew but they work on the best effort basis and don’t add any constraint.

Implement new setting in OpenSearch:
index.routing.allocation.total_primary_shards_per_node: An index-level setting to limit primary shards per node for a specific index. Store this limit (indexTotalPrimaryShardsPerNodeLimit) in index metadata, similar to indexTotalShardsPerNodeLimit.

This setting will enhance control over primary shard distribution, improving cluster balance and performance management.
The existing ShardsLimitAllocationDecider class already contains the necessary infrastructure and logic to evaluate shard allocation constraints. It has access to the current cluster state, routing information, and methods to check shard counts per node. Given this existing functionality, we propose implementing the new primary shard limit settings within this class. This approach leverages the current decision-making framework, ensuring consistency with existing allocation rules and minimizing code duplication. By extending the ShardsLimitAllocationDecider, we can efficiently integrate the new primary shard limit checks into the existing allocation decision process.

Related Issues

Resolves #17293

Check List

  • [✔️] Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • [✔️] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Feb 7, 2025

❌ Gradle check result for 721865e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Feb 7, 2025

❌ Gradle check result for 920f71a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

untitled/.gitignore Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Feb 9, 2025

✅ Gradle check result for ebb6a2b: SUCCESS

Copy link

codecov bot commented Feb 9, 2025

Codecov Report

Attention: Patch coverage is 96.29630% with 1 line in your changes missing coverage. Please review.

Project coverage is 72.46%. Comparing base (77e4112) to head (ebb6a2b).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...location/decider/ShardsLimitAllocationDecider.java 95.83% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17295      +/-   ##
============================================
+ Coverage     72.40%   72.46%   +0.06%     
- Complexity    65554    65594      +40     
============================================
  Files          5292     5292              
  Lines        304493   304548      +55     
  Branches      44218    44231      +13     
============================================
+ Hits         220463   220696     +233     
+ Misses        65975    65789     -186     
- Partials      18055    18063       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

❌ Gradle check result for ed6fb58: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Divyansh Pandey added 3 commits February 11, 2025 12:19
… index per node. Added relevant files for unit test and integration test.

Signed-off-by: Divyansh Pandey <dpaandey@amazon.com>
Signed-off-by: Divyansh Pandey <dpaandey@amazon.com>
Signed-off-by: Divyansh Pandey <dpaandey@amazon.com>
Copy link
Contributor

❌ Gradle check result for e76ebb6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

… to RoutingNode.java

Signed-off-by: Divyansh Pandey <dpaandey@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request _No response_
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Primary Shard Count Constraint
3 participants