Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A replica can be promoted and started in one cluster state update #32042

Merged
merged 6 commits into from
Jul 18, 2018

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Jul 13, 2018

When a replica is fully recovered (i.e., in POST_RECOVERY state) we send a request to the master to start the shard. The master changes the state of the replica and publishes a cluster state to that effect. In certain cases, that cluster state can be processed on the node hosting the replica together with a cluster state that promotes that, now started, replica to a primary. This can happen due to cluster state batched processing or if the master died after having committed the cluster state that starts the shard but before publishing it to the node with the replica. If the master also held the primary shard, the new master node will remove the primary (as it failed) and will also immediately promote the replica (thinking it is started).

Sadly our code in IndexShard didn't allow for this which caused assertions to be tripped in some of our tests runs.

@bleskes bleskes added >bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v6.4.0 labels Jul 13, 2018
@bleskes bleskes requested a review from ywelsch July 13, 2018 15:04
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@bleskes
Copy link
Contributor Author

bleskes commented Jul 13, 2018

@ywelsch I'm on the fence as to whether this should go into 6.3 . Opinions?

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some smaller comments but main change LGTM. With the improved testing, I think this can go into 6.3.

if (newRouting.primary() && currentRouting.isRelocationTarget() == false) {
replicationTracker.activatePrimaryMode(getLocalCheckpoint());
}
assert currentRouting.isRelocationTarget() == false || currentRouting.primary() == false ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assertion confuses me: What should it express?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it's confusing. I tried to clarify by adding the assertion message. If it helps I can flip the boolean around:

(currentRouting.isRelocationTarget() && currentRouting.primary() && replicationTracker.isPrimaryMode() == false) == false 

but I think that might be just as confusing if not more. Let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep as is, with the assertion message I think it's ok. I wonder if we should have an assertion at the end of this method to say something like "if we have an active primary shard that's not relocating, then the replication tracker is in primary mode".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. will add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 3c79d57

/**
* re-enables default behavior to fail tests when shards created by this class fail
*/
protected void failOnShardFailures() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not used -> remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do.

* @param listeners an optional set of listeners to add to the shard
*/
protected IndexShard newShard(ShardRouting routing, ShardPath shardPath, IndexMetaData indexMetaData,
@Nullable IndexSearcherWrapper indexSearcherWrapper,
@Nullable EngineFactory engineFactory,
Runnable globalCheckpointSyncer,
IndexEventListener indexEventListener, IndexingOperationListener... listeners) throws IOException {
IndexEventListener indexEventListener, boolean ignoreShardFailures,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this parameter used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over of a failed attempt, will remove. good catch

@bleskes bleskes merged commit 5856c39 into elastic:master Jul 18, 2018
@bleskes bleskes deleted the shard_recovery_promotion branch July 18, 2018 09:30
bleskes added a commit that referenced this pull request Jul 19, 2018
…2042)

When a replica is fully recovered (i.e., in `POST_RECOVERY` state) we send a request to the master
to start the shard. The master changes the state of the replica and publishes a cluster state to that
effect. In certain cases, that cluster state can be processed on the node hosting the replica
*together* with a cluster state that promotes that, now started, replica to a primary. This can
happen due to cluster state batched processing or if the master died after having committed the
cluster state that starts the shard but before publishing it to the node with the replica. If the master
also held the primary shard, the new master node will remove the primary (as it failed) and will also
immediately promote the replica (thinking it is started).

Sadly our code in IndexShard didn't allow for this which caused [assertions](https://github.com/elastic/elasticsearch/blob/13917162ad5c59a96ccb4d6a81a5044546c45c22/server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java#L482) to be tripped in some of our tests runs.
bleskes added a commit that referenced this pull request Jul 19, 2018
…2042)

When a replica is fully recovered (i.e., in `POST_RECOVERY` state) we send a request to the master
to start the shard. The master changes the state of the replica and publishes a cluster state to that
effect. In certain cases, that cluster state can be processed on the node hosting the replica
*together* with a cluster state that promotes that, now started, replica to a primary. This can
happen due to cluster state batched processing or if the master died after having committed the
cluster state that starts the shard but before publishing it to the node with the replica. If the master
also held the primary shard, the new master node will remove the primary (as it failed) and will also
immediately promote the replica (thinking it is started).

Sadly our code in IndexShard didn't allow for this which caused [assertions](https://github.com/elastic/elasticsearch/blob/13917162ad5c59a96ccb4d6a81a5044546c45c22/server/src/main/java/org/elasticsearch/index/seqno/ReplicationTracker.java#L482) to be tripped in some of our tests runs.
dnhatn added a commit that referenced this pull request Jul 20, 2018
* master:
  Painless: Simplify Naming in Lookup Package (#32177)
  Handle missing values in painless (#32207)
  add support for write index resolution when creating/updating documents (#31520)
  ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864)
  Remove indication of future multi-homing support (#32187)
  Rest test - allow for snapshots to take 0 milliseconds
  Make x-pack-core generate a pom file
  Rest HL client: Add put watch action (#32026)
  Build: Remove pom generation for plugin zip files (#32180)
  Fix comments causing errors with Java 11
  Fix rollup on date fields that don't support epoch_millis (#31890)
  Detect and prevent configuration that triggers a Gradle bug (#31912)
  [test] port linux package packaging tests (#31943)
  Revert "Introduce a Hashing Processor (#31087)" (#32178)
  Remove empty @return from JavaDoc
  Adjust SSLDriver behavior for JDK11 changes (#32145)
  [test] use randomized runner in packaging tests (#32109)
  Add support for field aliases. (#32172)
  Painless: Fix caching bug and clean up addPainlessClass. (#32142)
  Call setReferences() on custom referring tokenfilters in _analyze (#32157)
  Fix BwC Tests looking for UUID Pre 6.4 (#32158)
  Improve docs for search preferences (#32159)
  use before instead of onOrBefore
  Add more contexts to painless execute api (#30511)
  Add EC2 credential test for repository-s3 (#31918)
  A replica can be promoted and started in one cluster state update (#32042)
  Fix Java 11 javadoc compile problem
  Fix CP for namingConventions when gradle home has spaces (#31914)
  Fix `range` queries on `_type` field for singe type indices (#31756)
  [DOCS] Update TLS on Docker for 6.3 (#32114)
  ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are
  Remove versionType from translog (#31945)
  Switch distribution to new style Requests (#30595)
  Build: Skip jar tests if jar disabled
  Painless: Add PainlessClassBuilder (#32141)
  Build: Make additional test deps of check (#32015)
  Disable C2 from using AVX-512 on JDK 10 (#32138)
  Build: Move shadow customizations into common code (#32014)
  Painless: Fix Bug with Duplicate PainlessClasses (#32110)
  Remove empty @param from Javadoc
  Re-disable packaging tests on suse boxes
  Docs: Fix missing example script quote (#32010)
  [ML] Wait for aliases in multi-node tests (#32086)
  [ML] Move analyzer dependencies out of categorization config (#32123)
  Ensure to release translog snapshot in primary-replica resync (#32045)
  Handle TokenizerFactory  TODOs (#32063)
  Relax TermVectors API to work with textual fields other than TextFieldType (#31915)
  Updates the build to gradle 4.9 (#32087)
  Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
  Check that client methods match API defined in the REST spec (#31825)
  Enable testing in FIPS140 JVM (#31666)
  Fix put mappings java API documentation (#31955)
  Add exclusion option to `keep_types` token filter (#32012)
  [Test] Modify assert statement for ssl handshake (#32072)
martijnvg added a commit that referenced this pull request Jul 21, 2018
* es/6.x: (24 commits)
  Fix broken backport
  Switch full-cluster-restart to new style Requests (#32140)
  Fix multi level nested sort (#32204)
  MINOR: Remove unused `IndexDynamicSettings` (#32237) (#32248)
  [Tests] Remove QueryStringQueryBuilderTests#toQuery class assertions (#32236)
  Switch rolling restart to new style Requests (#32147)
  Enhance Parent circuit breaker error message (#32056)
  [ML] Use default request durability for .ml-state index (#32233)
  Enable testing in FIPS140 JVM (#31666) (#32231)
  Remove indices stats timeout from monitoring docs
  TESTS: Check for Netty resource leaks (#31861) (#32225)
  Rename ranking evaluation response section (#32166)
  Dependencies: Upgrade to joda time 2.10 (#32160)
  Backport SSL context names (#30953) to 6.x (#32223)
  Require Gradle 4.9  as minimum version (#32200)
  Detect old trial licenses and mimic behaviour (#32209)
  Painless: Simplify Naming in Lookup Package (#32177)
  add support for write index resolution when creating/updating documents (#31520)
  A replica can be promoted and started in one cluster state update (#32042)
  Rest test - allow for snapshots to take 0 milliseconds
  ...
@jpountz jpountz removed the :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. label Jan 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v6.3.2 v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants