-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] the remote cluster connection will time out when the node restarts and the remote cluster seeds is point to itself #7950
Comments
I suspect @shwetathareja knows a lot about this problem, care to comment? |
Thanks @kkewwei for raising the issue. And in case of remote cluster, establishing connection can take longer and failure to connect to remote cluster shouldn't cause node drop in local cluster. It is anyway giving up after 10s, why should it even waste 10s to try it in sync in the ClusterApplier thread. We could definitely consider establishing remote connections in async after evaluating in detail. But the underlying issue reported on why it took longer to connect is not exactly valid deadlock in code, though it helped in surfacing the above problem of local node unable to join the cluster due to remote connection taking longer.
My understanding: Your local cluster which has one of the node as data0 is trying to connect to 10 remote clusters (remote1 .. remote10) For each remote cluster, the I am wondering should we have best effort validation checks to prevent adding transport address of local cluster in |
I agree with @shwetathareja. This isn't a valid deadlock scenario with multiple remotes pointing to the same local node. For the usual use cases, there will be a 10 seconds delay in node joining the cluster. We can definitely explore avoiding this altogether by establishing connections in async. |
I also agree with @shwetathareja and @ankitkala, the use seems unreasonable. But it's indeed be used in product. the reason why multiple remotes pointing to the same local node is as follow:
If we should add a timeout when establishing a remote connection? if the process is in async, it doesn't matter we wait for a while, even pointing to local node. |
@kkewwei Lets say there is an index index1 which is present in clusters c1, c2, c3. Now, in case you intend to land up on any of the cluster c1 to c3 but still be able to search index1, then every cluster should have remote configured with all the other cluster except itself. Also when configuring remote for a cluster, node ip should be from that cluster. Example: Cluster c1
Cluster c2
Cluster c3
I have not started looking into it but in general, establishing connection with remote cluster can be made async. Please feel free to take it forward if you want to do the investigation or raise PR.
It should never point to local node. It is not how it is suppose to work. The |
I've raised a PR for establishing the connections in async: #8038 |
@ankitkala other than fixing a deadlock, is there a performance benefit with your implementation in which a cluster comes up faster? |
Yes. If any seed node connection call is timing out, we were waiting for 30 seconds( With this change, this logic is in async, so the cluster will come up faster in such cases. |
@ankitkala This is a neat change that hides a performance improvement that could use a blog post on opensearch.org. Hint hint. |
Describe the bug
We configure many remote clusters pointing to the local cluster to satisfy the needs, the seeds are the domain name of local cluster, in our case, the domain name will map to the local node.
If the node restart, it will not join the cluster successfully because of block.
To Reproduce
Steps to reproduce the behavior:
The cluster: data0: (127.0.0.1:9200), master1 (127.0.0.1:9201), master2 (127.0.0.1:9202)
Step1: the cluster is healthy, contains the three nodes.
Step2: set 10 remote cluster settings, the remote_name={remote1...remote10}, which all point to the data0
Step3: kill the data0, then the data will be blocked when joining cluster.
Screenshots
The blocked jvm stack is as follows:
the related the log is as follow:
Failed to connect remote cluster:
The reason why data0 fails to join the cluster(
ClusterApplierService.applyChanges
):https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/transport/SniffConnectionStrategy.java#L330
But the master will be null by local ClusterState, and the new ClusterState is not set(it will be set later), so the request will be blocked.
That means, here is a deadlock, the connecting remote node depends on setting new ClusterState, but setting new ClusterState is blocked by the the connecting remote node.
So every connecting remote cluster(point to the local ip) will timeout 10s(see above), if we have ten more than the same remote cluster settings, it will cost 100s, which means it will be removed from the cluster by the master.
There are two problems when applying the new ClusterState:
Solution:
If we should handle this remote connection asynchronously?
Host/Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: