Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure to generate identical NoOp for the same failure #33141

Merged
merged 2 commits into from
Aug 27, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Aug 24, 2018

We generate slightly different NoOps in InternalEngine and TransportShardBulkAction for the same failure.

  1. InternalEngine uses Exception#getFailure to generate a message without the class name.

newOp [NoOp{seqNo=1, primaryTerm=1, reason='Contexts are mandatory in context enabled completion field [suggest_context]'}]

  1. TransportShardBulkAction uses Exception#toString to generate a message with the class name.

NoOp{seqNo=1, primaryTerm=1, reason='java.lang.IllegalArgumentException: Contexts are mandatory in context enabled completion field [suggest_context]'}

If a write operation fails while a replica is recovering, that replica will possibly receive two different NoOps: one from recovery and one from replication. These two different NoOps will trip TranslogWriter#assertNoSeqNumberConflict assertion.

This commit makes sure that we generate the same No-Ops for the same failure.

Closes #32986

We generate NoOps in two places: in InternalEngine and in
TransportShardBulkAction after a write operation failed. However, we
generate slightly different messages for the same failure:

1. InternalEngine uses Exception#getFailure to generate a message
without the class name.

> newOp [NoOp{seqNo=1, primaryTerm=1, reason='Contexts are mandatory in
> context enabled completion field [suggest_context]'}]

2. TransportShardBulkAction uses  Exception#toString  to generate a
message with the class name.

> NoOp{seqNo=1, primaryTerm=1,
> reason='java.lang.IllegalArgumentException: Contexts are mandatory in
> context enabled completion field [suggest_context]'}

If a write operation fails while a replica is recovering, that replica
will possibly receive two different NoOps: one from InternalEngine and
one from TransportShardBulkAction.  Two different NoOps will trip
TranslogWriter#assertNoSeqNumberConflic assertion.

This commit makes sure that we generate the same No-Ops for the same
failure.

Closes elastic#32986
@dnhatn dnhatn added >bug :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v7.0.0 v6.5.0 v6.4.1 labels Aug 24, 2018
@dnhatn dnhatn requested review from s1monw and jasontedor August 24, 2018 21:41
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Aug 27, 2018

Thanks @s1monw.

@dnhatn dnhatn merged commit 014b323 into elastic:master Aug 27, 2018
@dnhatn dnhatn deleted the fix-noop-failure branch August 27, 2018 19:59
dnhatn added a commit that referenced this pull request Aug 28, 2018
We generate slightly different NoOps in InternalEngine and
TransportShardBulkAction for the same failure.

1. InternalEngine uses Exception#getFailure to generate a message
without the class name: newOp [NoOp{seqNo=1, primaryTerm=1,
reason='Contexts are mandatory in context enabled completion field
[suggest_context]'}].

2. TransportShardBulkAction uses Exception#toString to generate a
message with the class name: NoOp{seqNo=1, primaryTerm=1,
reason='java.lang.IllegalArgumentException: Contexts are mandatory in
context enabled completion field [suggest_context]'}.

If a write operation fails while a replica is recovering, that replica
will possibly receive two different NoOps: one from recovery and one
from replication. These two different NoOps will trip
TranslogWriter#assertNoSeqNumberConflict assertion.

This commit ensures that we generate the same Noop for the same failure.

Closes #32986
dnhatn added a commit that referenced this pull request Aug 28, 2018
We generate slightly different NoOps in InternalEngine and
TransportShardBulkAction for the same failure.

1. InternalEngine uses Exception#getFailure to generate a message
without the class name: newOp [NoOp{seqNo=1, primaryTerm=1,
reason='Contexts are mandatory in context enabled completion field
[suggest_context]'}].

2. TransportShardBulkAction uses Exception#toString to generate a
message with the class name: NoOp{seqNo=1, primaryTerm=1,
reason='java.lang.IllegalArgumentException: Contexts are mandatory in
context enabled completion field [suggest_context]'}.

If a write operation fails while a replica is recovering, that replica
will possibly receive two different NoOps: one from recovery and one
from replication. These two different NoOps will trip
TranslogWriter#assertNoSeqNumberConflict assertion.

This commit ensures that we generate the same Noop for the same failure.

Closes #32986
dnhatn added a commit that referenced this pull request Aug 28, 2018
* master:
  [Rollup] Better error message when trying to set non-rollup index (#32965)
  HLRC: Use Optional in validation logic (#33104)
  Remove unused User class from protocol (#33137)
  ingest: Introduce the dissect processor (#32884)
  [Docs] Add link to es-kotlin-wrapper-client (#32618)
  [Docs] Remove repeating words (#33087)
  Minor spelling and grammar fix (#32931)
  Remove support for deprecated params._agg/_aggs for scripted metric aggregations (#32979)
  Watcher: Simplify finding next date in cron schedule (#33015)
  Run Third party audit with forbidden APIs CLI  (part3/3) (#33052)
  Fix plugin build test on Windows (#33078)
  HLRC+MINOR: Remove Unused Private Method (#33165)
  Remove old unused test script files (#32970)
  Build analysis-icu client JAR (#33184)
  Ensure to generate identical NoOp for the same failure (#33141)
  ShardSearchFailure#readFrom to set index and shardId (#33161)
dnhatn added a commit that referenced this pull request Aug 28, 2018
* 6.x:
  [Rollup] Better error message when trying to set non-rollup index (#32965)
  Remove unused User class from protocol (#33137)
  [DOCS] Adds link to 6.3.0 release highlights
  Test: fix token bwc tests due to bad backport
  Ensure to generate identical NoOp for the same failure (#33141)
  [Docs] Add link to es-kotlin-wrapper-client (#32618)
  [Docs] Remove repeating words (#33087)
  Minor spelling and grammar fix (#32931)
  Run Third party audit with forbidden APIs CLI  (part3/3) (#33052)
  Fix plugin build test on Windows (#33078)
  Watcher: Simplify finding next date in cron schedule (#33015)
  Remove old unused test script files (#32970)
  Build analysis-icu client JAR (#33184)
  Switch remaining tests to new style Requests (#33109)
  Use internal connection manager when fetching remote node info
  Switch remaining x-pack tests to new style Requests (#33108)
  Switch remaining ml tests to new style Requests (#33107)
  Token API supports the client_credentials grant (#33106)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v6.4.1 v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants