testDoNotInfinitelyWaitForMapping fails #47974

dnhatn · 2019-10-14T01:18:32Z

This test starts failing since #46959 where we cancel an ongoing recovery if we find a new copy that can perform a noop recovery.

[2019-10-13T06:29:10,208][WARN ][o.e.c.r.a.AllocationService] [node_t0] failing shard [failed shard, shard [test][0], node[d1YsSCuCScGq6Micn4jKqQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=o1Qiiu9PSO-fB6M84wsdtA], unassigned_info[[reason=REALLOCATED_REPLICA], at[2019-10-13T17:29:10.107Z], delayed=false, details[existing allocation of replica to [{node_t1}{Z7bFtb4bSyWxw8hW8jhLjQ}{ApVGNCrUQkyIG-5hbQuRiw}{127.0.0.1}{127.0.0.1:44756}{dim}] cancelled, can perform a noop recovery on [{node_t2}{d1YsSCuCScGq6Micn4jKqQ}{vJfgfJ52RwqCdjqnUdYOAQ}{127.0.0.1}{127.0.0.1:44089}{dim}]], allocation_status[no_attempt]], message [shard failure, reason [index id[u0] origin[PEER_RECOVERY] seq#[0]]], markAsStale [true], failure [org.elasticsearch.index.mapper.MapperParsingException: simulate mapping parsing error

CI: https://gradle-enterprise.elastic.co/s/zbewn2l6ksvd2/tests/kyv2y2z3r4v7m-bgzwhe6nv7k4c

Relates #46959

elasticmachine · 2019-10-14T01:18:34Z

Pinging @elastic/es-distributed (:Distributed/Allocation)

Tracked at #47974

Tracked at elastic#47974

This change fixes a poisonous situation where an ongoing recovery was canceled because a better copy was found on a node that the cluster had previously tried allocating the shard to but failed. The solution is to keep track of the set of nodes that an allocation was failed on so that we can avoid canceling the current recovery for a copy on failed nodes. Closes #47974

dnhatn added >test-failure Triaged test failures from CI :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Oct 14, 2019

dnhatn self-assigned this Oct 14, 2019

dnhatn added a commit that referenced this issue Oct 14, 2019

Mute testDoNotInfinitelyWaitForMapping

8180cf1

Tracked at #47974

dnhatn added a commit that referenced this issue Oct 14, 2019

Mute testDoNotInfinitelyWaitForMapping

67ec986

Tracked at #47974

howardhuanghua pushed a commit to TencentCloudES/elasticsearch that referenced this issue Oct 14, 2019

Mute testDoNotInfinitelyWaitForMapping

e9dd69b

Tracked at elastic#47974

dnhatn mentioned this issue Oct 19, 2019

Do not cancel ongoing recovery for noop copy on broken node #48265

Merged

dnhatn closed this as completed in #48265 Nov 1, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testDoNotInfinitelyWaitForMapping fails #47974

testDoNotInfinitelyWaitForMapping fails #47974

dnhatn commented Oct 14, 2019

elasticmachine commented Oct 14, 2019

testDoNotInfinitelyWaitForMapping fails #47974

testDoNotInfinitelyWaitForMapping fails #47974

Comments

dnhatn commented Oct 14, 2019

elasticmachine commented Oct 14, 2019