Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set acking timeout to 0 on dynamic mapping update #31140

Merged
merged 5 commits into from
Jan 24, 2019

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jun 6, 2018

As acking can fail for any reason (unrelated node being too slow, node disconnecting), it should not be required for acking to succeed in order for index requests with dynamic mapping updates to successfully complete.

Relates to #30672 and Closes #30844

@ywelsch ywelsch added >enhancement :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v7.0.0 v6.4.0 labels Jun 6, 2018
@ywelsch ywelsch requested a review from bleskes June 6, 2018 14:25
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question. I also want to discuss the following - since we're ignoring the ack, should we also just not set it to the mapping timeout but rather always to 0?

@@ -315,6 +309,7 @@ public void testDelayedMappingPropagationOnReplica() throws Exception {
Settings.builder()
.put(DiscoverySettings.COMMIT_TIMEOUT_SETTING.getKey(), "30s") // explicitly set so it won't default to publish timeout
.put(DiscoverySettings.PUBLISH_TIMEOUT_SETTING.getKey(), "0s") // don't wait post commit as we are blocking things by design
.put(MappingUpdatedAction.INDICES_MAPPING_DYNAMIC_TIMEOUT_SETTING.getKey(), "2s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate on the rational of how you got to 2s?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's high enough so that the cluster state update task will actually get to run. It's low enough so that the test still finishes in a reasonable amount of time.

@ywelsch
Copy link
Contributor Author

ywelsch commented Jun 6, 2018

If I read the code correctly, setting the timeout to 0 would result in the update mapping call returning prematurely, even before the publishing has started, which is certainly not what we want?

@ywelsch ywelsch changed the title Don't check ack status on dynamic mapping update Set acking timeout to 0 on dynamic mapping update Jun 14, 2018
@ywelsch
Copy link
Contributor Author

ywelsch commented Jun 14, 2018

After merging #31303, I've updated this PR to set the acking timeout to 0.

@ywelsch
Copy link
Contributor Author

ywelsch commented Jun 15, 2018

This triggered a failure on the doc tests, which uncovered a larger issue. The reason is that with this change, we now always trigger a RetryOnPrimaryException in case of a dynamic mapping update. In return this can lead to the local checkpoint getting stuck on a replica. I've added a test that illustrates this in e8e6374

@lcawl lcawl added v6.4.1 and removed v6.4.0 labels Aug 23, 2018
@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 11, 2019

@bleskes I've updated this PR, which is now simpler due to the refactoring of TransportShardBulkAction.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -397,6 +399,21 @@ public void onFailure(Exception e) {

assertBusy(() -> assertTrue(client().prepareGet("index", "type", "1").get().isExists()));

ActionFuture<IndexResponse> dynamicMappingsFut = client().prepareIndex("index", "type", "2").setSource("field2", 42).execute();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this just biffs up the test, or am I missing something specific?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we sometime should not have an extra field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first document that is put on the test does not use dynamic mappings (there is an explicit putMapping call for that one). This one does an actual dynamic mapping update (the thing we changed in this PR).
Not having an extra field is just what the first document tests :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment to the test here to clarify

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks. I thought that the first call was indeed dynamic (and also adds a type, which is why another call to only add a field is required). Thanks for clarifying.

@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 11, 2019

The test failure here uncovered a problem with CCR I think. The failure is:

WARNING: Uncaught exception in thread: Thread[elasticsearch[followerd4][write][T#2],5,TGRP-IndexFollowingIT]
  2> java.lang.AssertionError: Only already-processed error should happen; op=[Index{id='1', type='doc', seqNo=0, primaryTerm=1, version=1, autoGeneratedIdTimestamp=-1}] error=[null]

I suspect that this is fails because the follower does not have the right mapping update for the batch of operations that it tries to index. CCR gets a batch of operations from the leader cluster and assumes that the mapping version that it uses from the ClusterService is at least as up-to-date as the operations that it gets from the index. This is not true, because we only expose the cluster state on ClusterService after it's been applied to the shard (and the index operation have possibly already occurred).

/cc: @martijnvg @jasontedor

@dnhatn
Copy link
Member

dnhatn commented Jan 14, 2019

The test failure here uncovered a problem with CCR I think.

@ywelsch Thanks for the ping. @jasontedor @martijnvg I will look into this issue.

dnhatn added a commit that referenced this pull request Jan 23, 2019
Today we keep the mapping on the follower in sync with the leader's
using the mapping version from changes requests. There are two rare
cases where the mapping on the follower is not synced properly:

1. The returned mapping version (from ClusterService) is outdated than
the actual mapping. This happens because we expose the latest cluster
state in ClusterService after applying it to IndexService.

2. It's possible for the FollowTask to receive an outdated mapping than
the min_required_mapping. In that case, it should fetch the mapping
again; otherwise, the follower won't have the right mapping.

Relates to #31140
dnhatn added a commit that referenced this pull request Jan 23, 2019
Today we keep the mapping on the follower in sync with the leader's
using the mapping version from changes requests. There are two rare
cases where the mapping on the follower is not synced properly:

1. The returned mapping version (from ClusterService) is outdated than
the actual mapping. This happens because we expose the latest cluster
state in ClusterService after applying it to IndexService.

2. It's possible for the FollowTask to receive an outdated mapping than
the min_required_mapping. In that case, it should fetch the mapping
again; otherwise, the follower won't have the right mapping.

Relates to #31140
@ywelsch ywelsch merged commit 64adb5a into elastic:master Jan 24, 2019
ywelsch added a commit that referenced this pull request Jan 24, 2019
As acking can fail for any reason (unrelated node being too slow, node disconnecting), it should not
be required for acking to succeed in order for index requests with dynamic mapping updates to
successfully complete.

Relates to #30672 and Closes #30844
ywelsch added a commit that referenced this pull request Jan 24, 2019
ywelsch added a commit that referenced this pull request Jan 24, 2019
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jan 24, 2019
* elastic/master:
  Optimize warning header de-duplication (elastic#37725)
  Bubble exceptions up in ClusterApplierService (elastic#37729)
  SQL: Improve handling of invalid args for PERCENTILE/PERCENTILE_RANK (elastic#37803)
  Remove unused ThreadBarrier class (elastic#37666)
  Add built-in user and role for code plugin (elastic#37030)
  Consolidate testclusters tests into a single project (elastic#37362)
  Fix docs for MappingUpdatedAction
  SQL: Introduce SQL DATE data type (elastic#37693)
  disabling bwc test while backporting elastic#37639
  Mute ClusterDisruptionIT testAckedIndexing
  Set acking timeout to 0 on dynamic mapping update (elastic#31140)
  Remove index audit output type (elastic#37707)
  Mute FollowerFailOverIT testReadRequestsReturnsLatestMappingVersion
  [ML] Increase close job timeout and lower the max number (elastic#37770)
  Remove Custom Listeners from SnapshotsService (elastic#37629)
  Use m_m_nodes from Zen1 master for Zen2 bootstrap (elastic#37701)
  Fix index filtering in follow info api. (elastic#37752)
  Use project dependency instead of substitutions for distributions (elastic#37730)
  Update authenticate to allow unknown fields (elastic#37713)
  Deprecate HLRC EmptyResponse used by security (elastic#37540)
dnhatn added a commit that referenced this pull request Jan 25, 2019
This test is obsolete since #31140 where an index request with dynamic
mapping update no longer requires acking.

Closes #37816
dnhatn added a commit that referenced this pull request Jan 25, 2019
Dynamic mapping update of an index request no longer requires acks from
all nodes. We need busily to check the mapping update.

Relates #31140.
Closes #37817
dnhatn added a commit that referenced this pull request Jan 25, 2019
This test is obsolete since #31140 where an index request with dynamic
mapping update no longer requires acking.

Closes #37816
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Jan 30, 2019
We need to await busily for the mapping in this test because we no
longer require acking on the dynamic mapping of an index request.

Relates elastic#31140
Closes elastic#37928
dnhatn added a commit that referenced this pull request Jan 31, 2019
If a replica does not have a right mapping yet, we will retry the index
request on that replica; then the actual tasks is higher than the
expected tasks. Since #31140 this happens more frequently for we no
longer require acking on the dynamic mapping of index requests.

Relates #31140
Closes #37893
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Feb 1, 2019
If a replica does not have a right mapping yet, we will retry the index
request on that replica; then the actual tasks is higher than the
expected tasks. Since elastic#31140 this happens more frequently for we no
longer require acking on the dynamic mapping of index requests.

Relates elastic#31140
Closes elastic#37893
dnhatn added a commit that referenced this pull request Feb 1, 2019
…8045)

Since #31140 we no longer require acking on the dynamic mapping of index
requests. Thus, a returned mapping from a get mapping request does not
necessarily contain the dynamic updates from the index request. This
commit replaces the dynamic mapping update with a manual put mapping.

Relates #31140
Closes #37928
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Feb 1, 2019
…astic#38045)

Since elastic#31140 we no longer require acking on the dynamic mapping of index
requests. Thus, a returned mapping from a get mapping request does not
necessarily contain the dynamic updates from the index request. This
commit replaces the dynamic mapping update with a manual put mapping.

Relates elastic#31140
Closes elastic#37928
dnhatn added a commit that referenced this pull request Feb 1, 2019
Since #31140 we no longer require acking on the dynamic mapping of index
requests. Thus, a returned mapping from a get mapping request does not
necessarily contain the dynamic updates from the index request. This
commit replaces the dynamic mapping update with a manual put mapping.

Relates #31140
Closes #37928
dnhatn added a commit that referenced this pull request Feb 1, 2019
If a replica does not have a right mapping yet, we will retry the index
request on that replica; then the actual tasks is higher than the
expected tasks. Since #31140 this happens more frequently for we no
longer require acking on the dynamic mapping of index requests.

Relates #31140
Closes #37893
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >enhancement v6.7.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants