-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set acking timeout to 0 on dynamic mapping update #31140
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a question. I also want to discuss the following - since we're ignoring the ack, should we also just not set it to the mapping timeout but rather always to 0?
@@ -315,6 +309,7 @@ public void testDelayedMappingPropagationOnReplica() throws Exception { | |||
Settings.builder() | |||
.put(DiscoverySettings.COMMIT_TIMEOUT_SETTING.getKey(), "30s") // explicitly set so it won't default to publish timeout | |||
.put(DiscoverySettings.PUBLISH_TIMEOUT_SETTING.getKey(), "0s") // don't wait post commit as we are blocking things by design | |||
.put(MappingUpdatedAction.INDICES_MAPPING_DYNAMIC_TIMEOUT_SETTING.getKey(), "2s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate on the rational of how you got to 2s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's high enough so that the cluster state update task will actually get to run. It's low enough so that the test still finishes in a reasonable amount of time.
If I read the code correctly, setting the timeout to 0 would result in the update mapping call returning prematurely, even before the publishing has started, which is certainly not what we want? |
After merging #31303, I've updated this PR to set the acking timeout to 0. |
This triggered a failure on the doc tests, which uncovered a larger issue. The reason is that with this change, we now always trigger a RetryOnPrimaryException in case of a dynamic mapping update. In return this can lead to the local checkpoint getting stuck on a replica. I've added a test that illustrates this in e8e6374 |
6e09683
to
779cb2c
Compare
@bleskes I've updated this PR, which is now simpler due to the refactoring of TransportShardBulkAction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -397,6 +399,21 @@ public void onFailure(Exception e) { | |||
|
|||
assertBusy(() -> assertTrue(client().prepareGet("index", "type", "1").get().isExists())); | |||
|
|||
ActionFuture<IndexResponse> dynamicMappingsFut = client().prepareIndex("index", "type", "2").setSource("field2", 42).execute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this just biffs up the test, or am I missing something specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if we sometime should not have an extra field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first document that is put on the test does not use dynamic mappings (there is an explicit putMapping call for that one). This one does an actual dynamic mapping update (the thing we changed in this PR).
Not having an extra field is just what the first document tests :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment to the test here to clarify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks. I thought that the first call was indeed dynamic (and also adds a type, which is why another call to only add a field is required). Thanks for clarifying.
The test failure here uncovered a problem with CCR I think. The failure is:
I suspect that this is fails because the follower does not have the right mapping update for the batch of operations that it tries to index. CCR gets a batch of operations from the leader cluster and assumes that the mapping version that it uses from the /cc: @martijnvg @jasontedor |
@ywelsch Thanks for the ping. @jasontedor @martijnvg I will look into this issue. |
Today we keep the mapping on the follower in sync with the leader's using the mapping version from changes requests. There are two rare cases where the mapping on the follower is not synced properly: 1. The returned mapping version (from ClusterService) is outdated than the actual mapping. This happens because we expose the latest cluster state in ClusterService after applying it to IndexService. 2. It's possible for the FollowTask to receive an outdated mapping than the min_required_mapping. In that case, it should fetch the mapping again; otherwise, the follower won't have the right mapping. Relates to #31140
Today we keep the mapping on the follower in sync with the leader's using the mapping version from changes requests. There are two rare cases where the mapping on the follower is not synced properly: 1. The returned mapping version (from ClusterService) is outdated than the actual mapping. This happens because we expose the latest cluster state in ClusterService after applying it to IndexService. 2. It's possible for the FollowTask to receive an outdated mapping than the min_required_mapping. In that case, it should fetch the mapping again; otherwise, the follower won't have the right mapping. Relates to #31140
* elastic/master: Optimize warning header de-duplication (elastic#37725) Bubble exceptions up in ClusterApplierService (elastic#37729) SQL: Improve handling of invalid args for PERCENTILE/PERCENTILE_RANK (elastic#37803) Remove unused ThreadBarrier class (elastic#37666) Add built-in user and role for code plugin (elastic#37030) Consolidate testclusters tests into a single project (elastic#37362) Fix docs for MappingUpdatedAction SQL: Introduce SQL DATE data type (elastic#37693) disabling bwc test while backporting elastic#37639 Mute ClusterDisruptionIT testAckedIndexing Set acking timeout to 0 on dynamic mapping update (elastic#31140) Remove index audit output type (elastic#37707) Mute FollowerFailOverIT testReadRequestsReturnsLatestMappingVersion [ML] Increase close job timeout and lower the max number (elastic#37770) Remove Custom Listeners from SnapshotsService (elastic#37629) Use m_m_nodes from Zen1 master for Zen2 bootstrap (elastic#37701) Fix index filtering in follow info api. (elastic#37752) Use project dependency instead of substitutions for distributions (elastic#37730) Update authenticate to allow unknown fields (elastic#37713) Deprecate HLRC EmptyResponse used by security (elastic#37540)
We need to await busily for the mapping in this test because we no longer require acking on the dynamic mapping of an index request. Relates elastic#31140 Closes elastic#37928
If a replica does not have a right mapping yet, we will retry the index request on that replica; then the actual tasks is higher than the expected tasks. Since elastic#31140 this happens more frequently for we no longer require acking on the dynamic mapping of index requests. Relates elastic#31140 Closes elastic#37893
…8045) Since #31140 we no longer require acking on the dynamic mapping of index requests. Thus, a returned mapping from a get mapping request does not necessarily contain the dynamic updates from the index request. This commit replaces the dynamic mapping update with a manual put mapping. Relates #31140 Closes #37928
…astic#38045) Since elastic#31140 we no longer require acking on the dynamic mapping of index requests. Thus, a returned mapping from a get mapping request does not necessarily contain the dynamic updates from the index request. This commit replaces the dynamic mapping update with a manual put mapping. Relates elastic#31140 Closes elastic#37928
Since #31140 we no longer require acking on the dynamic mapping of index requests. Thus, a returned mapping from a get mapping request does not necessarily contain the dynamic updates from the index request. This commit replaces the dynamic mapping update with a manual put mapping. Relates #31140 Closes #37928
As acking can fail for any reason (unrelated node being too slow, node disconnecting), it should not be required for acking to succeed in order for index requests with dynamic mapping updates to successfully complete.
Relates to #30672 and Closes #30844