Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testDoNotInfinitelyWaitForMapping fails #47974

Closed
dnhatn opened this issue Oct 14, 2019 · 1 comment · Fixed by #48265
Closed

testDoNotInfinitelyWaitForMapping fails #47974

dnhatn opened this issue Oct 14, 2019 · 1 comment · Fixed by #48265
Assignees
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >test-failure Triaged test failures from CI

Comments

@dnhatn
Copy link
Member

dnhatn commented Oct 14, 2019

This test starts failing since #46959 where we cancel an ongoing recovery if we find a new copy that can perform a noop recovery.

[2019-10-13T06:29:10,208][WARN ][o.e.c.r.a.AllocationService] [node_t0] failing shard [failed shard, shard [test][0], node[d1YsSCuCScGq6Micn4jKqQ], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=o1Qiiu9PSO-fB6M84wsdtA], unassigned_info[[reason=REALLOCATED_REPLICA], at[2019-10-13T17:29:10.107Z], delayed=false, details[existing allocation of replica to [{node_t1}{Z7bFtb4bSyWxw8hW8jhLjQ}{ApVGNCrUQkyIG-5hbQuRiw}{127.0.0.1}{127.0.0.1:44756}{dim}] cancelled, can perform a noop recovery on [{node_t2}{d1YsSCuCScGq6Micn4jKqQ}{vJfgfJ52RwqCdjqnUdYOAQ}{127.0.0.1}{127.0.0.1:44089}{dim}]], allocation_status[no_attempt]], message [shard failure, reason [index id[u0] origin[PEER_RECOVERY] seq#[0]]], markAsStale [true], failure [org.elasticsearch.index.mapper.MapperParsingException: simulate mapping parsing error

CI: https://gradle-enterprise.elastic.co/s/zbewn2l6ksvd2/tests/kyv2y2z3r4v7m-bgzwhe6nv7k4c

Relates #46959

@dnhatn dnhatn added >test-failure Triaged test failures from CI :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Oct 14, 2019
@dnhatn dnhatn self-assigned this Oct 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Allocation)

dnhatn added a commit that referenced this issue Oct 14, 2019
dnhatn added a commit that referenced this issue Oct 14, 2019
howardhuanghua pushed a commit to TencentCloudES/elasticsearch that referenced this issue Oct 14, 2019
dnhatn added a commit that referenced this issue Nov 1, 2019
This change fixes a poisonous situation where an ongoing recovery was
canceled because a better copy was found on a node that the cluster had
previously tried allocating the shard to but failed. The solution is to
keep track of the set of nodes that an allocation was failed on so that
we can avoid canceling the current recovery for a copy on failed nodes.

Closes #47974
dnhatn added a commit that referenced this issue Nov 9, 2019
This change fixes a poisonous situation where an ongoing recovery was
canceled because a better copy was found on a node that the cluster had
previously tried allocating the shard to but failed. The solution is to
keep track of the set of nodes that an allocation was failed on so that
we can avoid canceling the current recovery for a copy on failed nodes.

Closes #47974
dnhatn added a commit that referenced this issue Nov 9, 2019
This change fixes a poisonous situation where an ongoing recovery was
canceled because a better copy was found on a node that the cluster had
previously tried allocating the shard to but failed. The solution is to
keep track of the set of nodes that an allocation was failed on so that
we can avoid canceling the current recovery for a copy on failed nodes.

Closes #47974
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants