Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't break allocation if resize source index is missing #29311

Merged
merged 2 commits into from
Apr 3, 2018

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Mar 30, 2018

DiskThresholdDecider currently assumes that the source index of a resize operation (e.g. shrink) is available, and throws an IndexNotFoundException otherwise, thereby breaking any kind of shard allocation. This can be quite harmful if the source index is deleted during a shrink, or if the source index is unavailable during state recovery.
While this behavior has been partly fixed in 6.1 and above (due to #26931), it relies on the order in which AllocationDeciders are executed (i.e. that ResizeAllocationDecider returns NO, ensuring that DiskThresholdDecider does not run, something that for example does not hold for the allocation explain API).
This change adds a more complete fix, and also solves the situation for 5.6.

@ywelsch ywelsch added >bug v7.0.0 v6.3.0 :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v5.6.9 labels Mar 30, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch ywelsch requested a review from s1monw March 30, 2018 08:37
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywelsch ywelsch merged commit 2dc546c into elastic:master Apr 3, 2018
ywelsch added a commit that referenced this pull request Apr 3, 2018
DiskThresholdDecider currently assumes that the source index of a resize operation (e.g. shrink)
is available, and throws an IndexNotFoundException otherwise, thereby breaking any kind of shard
allocation. This can be quite harmful if the source index is deleted during a shrink, or if the source
index is unavailable during state recovery.

While this behavior has been partly fixed in 6.1 and above (due to #26931), it relies on the order in
which AllocationDeciders are executed (i.e. that ResizeAllocationDecider returns NO, ensuring that
DiskThresholdDecider does not run, something that for example does not hold for the allocation
explain API).

This change adds a more complete fix, and also solves the situation for 5.6.
ywelsch added a commit that referenced this pull request Apr 3, 2018
DiskThresholdDecider currently assumes that the source index of a resize operation (e.g. shrink)
is available, and throws an IndexNotFoundException otherwise, thereby breaking any kind of shard
allocation. This can be quite harmful if the source index is deleted during a shrink, or if the source
index is unavailable during state recovery.

While this behavior has been partly fixed in 6.1 and above (due to #26931), it relies on the order in
which AllocationDeciders are executed (i.e. that ResizeAllocationDecider returns NO, ensuring that
DiskThresholdDecider does not run, something that for example does not hold for the allocation
explain API).

This change adds a more complete fix, and also solves the situation for 5.6.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 3, 2018
* master:
  Reindex: Fix error in delete-by-query rest spec (elastic#29318)
  Improve similarity integration. (elastic#29187)
  Fix some query extraction bugs. (elastic#29283)
  [Docs] Correct experimental note formatting
  Move Nullable into core (elastic#29341)
  [Docs] Update getting-started.asciidoc (elastic#29294)
  Elasticsearch 6.3.0 is now on Lucene 7.3.
  [DOCS] Refer back to index API for full-document updates in _update API section (elastic#28677)
  Fix missing comma in ingest-node.asciidoc (elastic#29343)
  Improve exception handling on TransportMasterNodeAction (elastic#29314)
  Don't break allocation if resize source index is missing (elastic#29311)
  Use fixture to test repository-s3 plugin (elastic#29296)
  Fix NDCG for empty search results (elastic#29267)
  Pass through script params in scripted metric agg (elastic#29154)
  Fix Eclipse build.
  Upgrade to lucene-7.3.0-snapshot-98a6b3d. (elastic#29298)
  Painless: Remove extraneous INLINE constant. (elastic#29340)
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 3, 2018
* master:
  Build: Fix Java9 MR build (elastic#29312)
  Reindex: Fix error in delete-by-query rest spec (elastic#29318)
  Improve similarity integration. (elastic#29187)
  Fix some query extraction bugs. (elastic#29283)
  [Docs] Correct experimental note formatting
  Move Nullable into core (elastic#29341)
  [Docs] Update getting-started.asciidoc (elastic#29294)
  Elasticsearch 6.3.0 is now on Lucene 7.3.
  [DOCS] Refer back to index API for full-document updates in _update API section (elastic#28677)
  Fix missing comma in ingest-node.asciidoc (elastic#29343)
  Improve exception handling on TransportMasterNodeAction (elastic#29314)
  Don't break allocation if resize source index is missing (elastic#29311)
  Use fixture to test repository-s3 plugin (elastic#29296)
  Fix NDCG for empty search results (elastic#29267)
  Pass through script params in scripted metric agg (elastic#29154)
  Fix Eclipse build.
  Upgrade to lucene-7.3.0-snapshot-98a6b3d. (elastic#29298)
  Painless: Remove extraneous INLINE constant. (elastic#29340)
  Remove HTTP max content length leniency (elastic#29337)
  Begin moving XContent to a separate lib/artifact (elastic#29300)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v5.6.9 v6.3.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants