Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to one shard #30539

Merged
merged 17 commits into from
May 14, 2018
Merged

Conversation

jasontedor
Copy link
Member

@jasontedor jasontedor commented May 11, 2018

This commit changes the default out-of-the-box configuration for the number of shards from five to one. We think this will help address a common problem of oversharding. For users with time-based indices that need a different default, this can be managed with index templates. For users with non-time-based indices that find they need to re-shard with the split API in place they no longer need to resort only to reindexing.

Since this has the impact of changing the default number of shards used in REST tests, we want to ensure that we still have coverage for issues that could arise from multiple shards. As such, we randomize (rarely) the default number of shards in REST tests to two. This is managed via a global index template. However, some tests check the templates that are in the cluster state during the test. Since this template is randomly there, we need a way for tests to skip adding the template used to set the number of shards to two. For this we add the default_shards feature skip. To avoid having to write our docs in a complicated way because sometimes they might be behind one shard, and sometimes they might be behind two shards we apply the default_shards feature skip to all docs tests. That is, these tests will always run with the default number of shards (one).

This commit changes the default out-of-the-box configuration for the
number of shards from five to one. We think this will help address a
common problem of oversharding. For users with time-based indices that
need a different default, this can be managed with index templates. For
users with non-time-based indices that find they need to re-shard with
the shrink API in place they no longer need to resort only to
reindexing.

Since this has the impact of changing the default number of shards used
in REST tests, we want to ensure that we still have coverage for issues
that could arise from multiple shards. As such, we randomize (rarely)
the default number of shards in REST tests to two. This is managed via a
global index template. However, some tests check the templates that are
in the cluster state during the test. Since this template is randomly
there, we need a way for tests to skip adding the template used to set
the number of shards to two. For this we add the default_shards feature
skip. To avoid having to write our docs in a complicated way because
sometimes they might be behind one shard, and sometimes they might be
behind two shards we apply the default_shards feature skip to all docs
tests. That is, these tests will always run with the default number of
shards (one).
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@jasontedor
Copy link
Member Author

FYI @elastic/es-clients for the feature skip default_shards. Note that you should not need to be concerned with this at all as the randomization to add a global template to set index.number_of_shards to 2 is done via our test runner so you will always pick up the new product default of 1. Thus, you do not need to skip tests that have this feature skip.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this was much simpler (fewer changes needed) than I would have anticipated!

This commit fixes a few issues in the REST client tests that arose from
moving to one shard. Moving to one shard causes a reordering of
calculations which impacts floating-point arithmetic. It can also impact
scoring and thus the ranking of docs.
This commit fixes a SQL test with security that was expecting five
shards.
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one comment LGTM otherwise

@@ -367,7 +367,7 @@ public ClusterState execute(ClusterState currentState) throws Exception {
// now, put the request settings, so they override templates
indexSettingsBuilder.put(request.settings());
if (indexSettingsBuilder.get(SETTING_NUMBER_OF_SHARDS) == null) {
indexSettingsBuilder.put(SETTING_NUMBER_OF_SHARDS, settings.getAsInt(SETTING_NUMBER_OF_SHARDS, 5));
indexSettingsBuilder.put(SETTING_NUMBER_OF_SHARDS, settings.getAsInt(SETTING_NUMBER_OF_SHARDS, 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be 5 if the index.version.created is < 7.0 I mean we can still have a mixed cluster here no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s1monw I pushed 2e89d98.

@s1monw
Copy link
Contributor

s1monw commented May 12, 2018

Ship it

danielmitterdorfer added a commit to danielmitterdorfer/rally-tracks that referenced this pull request May 13, 2018
With this commit we set the number of shards explicitly to the default
of five shards in order to keep results consistent.

Relates elastic/elasticsearch#30539
* master:
  Deprecate not copy settings and explicitly disallow (elastic#30404)
  [ML] Improve state persistence log message
  Build: Add mavenPlugin cluster configuration method (elastic#30541)
  Re-enable FlushIT tests
  Bump Gradle heap to 2 GB (elastic#30535)
  SQL: Use request flavored methods in tests (elastic#30345)
* master:
  Adjust copy settings versions
  Mute ShrinkIndexIT suite
  SQL: SYS TABLES ordered according to *DBC specs (elastic#30530)
@clintongormley
Copy link

Let's make sure we blog about this to explain the reasoning.

@jasontedor jasontedor merged commit 4a4e3d7 into elastic:master May 14, 2018
@jasontedor jasontedor deleted the one-shard-to-rule-them-all branch May 14, 2018 16:22
dnhatn added a commit that referenced this pull request May 14, 2018
* master:
  Default to one shard (#30539)
  Unmute IndexUpgradeIT tests
  Forbid expensive query parts in ranking evaluation (#30151)
  Docs: Update HighLevelRestClient migration docs (#30544)
  Clients: Switch to new performRequest (#30543)
  [TEST] Fix typo in MovAvgIT test
  Add missing dependencies on testClasses (#30527)
  [TEST] Mute ML test that needs updating to following ml-cpp changes
  Document woes between auto-expand-replicas and allocation filtering (#30531)
  Moved tokenizers to analysis common module (#30538)
  Adjust copy settings versions
  Mute ShrinkIndexIT suite
  SQL: SYS TABLES ordered according to *DBC specs (#30530)
  Deprecate not copy settings and explicitly disallow (#30404)
  [ML] Improve state persistence log message
  Build: Add mavenPlugin cluster configuration method (#30541)
  Re-enable FlushIT tests
  Bump Gradle heap to 2 GB (#30535)
  SQL: Use request flavored methods in tests (#30345)
  Suppress hdfsFixture if there are spaces in the path (#30302)
  Delete temporary blobs before creating index file (#30528)
  Watcher: Remove TriggerEngine.getJobCount() (#30395)
  [ML] Fix wire BWC for JobUpdate (#30512)
  Use simpler write-once semantics for FS repository (#30435)
  Derive max composite buffers from max content len
  Use simpler write-once semantics for HDFS repository (#30439)
  SQL: Improve correctness of SYS COLUMNS & TYPES (#30418)
  Mute two tests in FlushIT with @AwaitsFix.
  Fix incorrect template name in test case
  Build: Remove legacy bwc files from xpack (#30485)
  Mute UnicastZenPingTests#testSimplePings with @AwaitsFix.
  Security: cleanup code in file stores (#30348)
  Security: fix TokenMetaData equals and hashcode (#30347)
  Mute two tests from SmokeTestWatcherWithSecurityClientYamlTestSuiteIT.
  Mute SharedClusterSnapshotRestoreIT#testSnapshotSucceedsAfterSnapshotFailure with @AwaitsFix.
  SQL: Improve compatibility with MS query (#30516)
  SQL: Fix parsing of dates with milliseconds (#30419)
martijnvg added a commit that referenced this pull request May 15, 2018
* es/ccr: (37 commits)
  Default to one shard (#30539)
  Unmute IndexUpgradeIT tests
  Forbid expensive query parts in ranking evaluation (#30151)
  Docs: Update HighLevelRestClient migration docs (#30544)
  Clients: Switch to new performRequest (#30543)
  [TEST] Fix typo in MovAvgIT test
  Add missing dependencies on testClasses (#30527)
  [TEST] Mute ML test that needs updating to following ml-cpp changes
  Document woes between auto-expand-replicas and allocation filtering (#30531)
  Moved tokenizers to analysis common module (#30538)
  Adjust copy settings versions
  Mute ShrinkIndexIT suite
  SQL: SYS TABLES ordered according to *DBC specs (#30530)
  Deprecate not copy settings and explicitly disallow (#30404)
  [ML] Improve state persistence log message
  Build: Add mavenPlugin cluster configuration method (#30541)
  Re-enable FlushIT tests
  Bump Gradle heap to 2 GB (#30535)
  SQL: Use request flavored methods in tests (#30345)
  Suppress hdfsFixture if there are spaces in the path (#30302)
  ...
martijnvg added a commit that referenced this pull request May 15, 2018
* es/ccr: (37 commits)
  Default to one shard (#30539)
  Unmute IndexUpgradeIT tests
  Forbid expensive query parts in ranking evaluation (#30151)
  Docs: Update HighLevelRestClient migration docs (#30544)
  Clients: Switch to new performRequest (#30543)
  [TEST] Fix typo in MovAvgIT test
  Add missing dependencies on testClasses (#30527)
  [TEST] Mute ML test that needs updating to following ml-cpp changes
  Document woes between auto-expand-replicas and allocation filtering (#30531)
  Moved tokenizers to analysis common module (#30538)
  Adjust copy settings versions
  Mute ShrinkIndexIT suite
  SQL: SYS TABLES ordered according to *DBC specs (#30530)
  Deprecate not copy settings and explicitly disallow (#30404)
  [ML] Improve state persistence log message
  Build: Add mavenPlugin cluster configuration method (#30541)
  Re-enable FlushIT tests
  Bump Gradle heap to 2 GB (#30535)
  SQL: Use request flavored methods in tests (#30345)
  Suppress hdfsFixture if there are spaces in the path (#30302)
  ...
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 15, 2018
* es/ccr: (37 commits)
  Default to one shard (elastic#30539)
  Unmute IndexUpgradeIT tests
  Forbid expensive query parts in ranking evaluation (elastic#30151)
  Docs: Update HighLevelRestClient migration docs (elastic#30544)
  Clients: Switch to new performRequest (elastic#30543)
  [TEST] Fix typo in MovAvgIT test
  Add missing dependencies on testClasses (elastic#30527)
  [TEST] Mute ML test that needs updating to following ml-cpp changes
  Document woes between auto-expand-replicas and allocation filtering (elastic#30531)
  Moved tokenizers to analysis common module (elastic#30538)
  Adjust copy settings versions
  Mute ShrinkIndexIT suite
  SQL: SYS TABLES ordered according to *DBC specs (elastic#30530)
  Deprecate not copy settings and explicitly disallow (elastic#30404)
  [ML] Improve state persistence log message
  Build: Add mavenPlugin cluster configuration method (elastic#30541)
  Re-enable FlushIT tests
  Bump Gradle heap to 2 GB (elastic#30535)
  SQL: Use request flavored methods in tests (elastic#30345)
  Suppress hdfsFixture if there are spaces in the path (elastic#30302)
  ...
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 15, 2018
* es/ccr: (37 commits)
  Default to one shard (elastic#30539)
  Unmute IndexUpgradeIT tests
  Forbid expensive query parts in ranking evaluation (elastic#30151)
  Docs: Update HighLevelRestClient migration docs (elastic#30544)
  Clients: Switch to new performRequest (elastic#30543)
  [TEST] Fix typo in MovAvgIT test
  Add missing dependencies on testClasses (elastic#30527)
  [TEST] Mute ML test that needs updating to following ml-cpp changes
  Document woes between auto-expand-replicas and allocation filtering (elastic#30531)
  Moved tokenizers to analysis common module (elastic#30538)
  Adjust copy settings versions
  Mute ShrinkIndexIT suite
  SQL: SYS TABLES ordered according to *DBC specs (elastic#30530)
  Deprecate not copy settings and explicitly disallow (elastic#30404)
  [ML] Improve state persistence log message
  Build: Add mavenPlugin cluster configuration method (elastic#30541)
  Re-enable FlushIT tests
  Bump Gradle heap to 2 GB (elastic#30535)
  SQL: Use request flavored methods in tests (elastic#30345)
  Suppress hdfsFixture if there are spaces in the path (elastic#30302)
  ...
atc0005 added a commit to atc0005/elasticsearch that referenced this pull request May 20, 2018
Update the default number of primary shards to match doc update
work done in 4852f34 for
PR elastic#30539.
jasontedor pushed a commit that referenced this pull request May 20, 2018
Update the default number of primary shards to match doc update work
done in #30539.
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this pull request May 22, 2018
With this commit we set the number of shards explicitly to the default
of five shards in order to keep results consistent.

Relates elastic/elasticsearch#30539
Relates #44
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this pull request May 22, 2018
With this commit we set the number of shards explicitly to the default
of five shards in order to keep results consistent.

Relates elastic/elasticsearch#30539
Relates #44
danielmitterdorfer added a commit to elastic/rally-tracks that referenced this pull request May 22, 2018
With this commit we set the number of shards explicitly to the default
of five shards in order to keep results consistent.

Relates elastic/elasticsearch#30539
Relates #44
@robcowart
Copy link

Overall this is a welcome change. However I was curious about why the decision for only a single shard instead of two?

We are largely unaffected here as we default to two shards in all of our index templates, and then adjust based on the customer's environment and load. We default to two because even on a single node, with a single mount point, all of our benchmarking tests show a 5-10% boost on indexing performance with two shards over only one.

Admittedly all of our use-cases are time-series oriented and heavily ingest-biased, so we are always looking to maximize ingest/index performance. Certainly most query-centric use-cases will add replicas, not shards.

Really I am just curious. I would have picked two as default, but either is much better than five.

dliappis added a commit to dliappis/rally that referenced this pull request Jun 13, 2018
Align with the new Elasticsearch defaults[1] for number of shards for
the indices used in the metrics store.

[1] elastic/elasticsearch#30539
@jasontedor
Copy link
Member Author

@robcowart I am so sorry for the slow reply here. Thanks so much for such a thoughtful question. Briefly, we think that the benefits of keeping shard counts as low as possible will benefit more users than shipping with a default configuration that will be beneficial to high-throughout use-cases. We aim to scalable in this regard, but we have a lot of users for which a single shard will suffice to absorb their write traffic. For users that need to scale, we have many knobs they can tune which include increasing their default shard count as well as splitting indices that were created with a single shard. We encourage such users to do performance testing to ensure that the tradeoff or doubling or more their shard count is worth the performance gain.

jasontedor added a commit that referenced this pull request Dec 7, 2018
This commit adds a migration note regarding the default number of shards
changing from five to one.

Relates #30539
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Jun 20, 2019
Mpdreamz pushed a commit to elastic/elasticsearch-net that referenced this pull request Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants