-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][broker]PIP-340 Optimization of Probe Implementation for Automatic Failover #22133
base: master
Are you sure you want to change the base?
Conversation
@yyj8 thanks for the contribution. The intention of the PIP process is that you'd first start the discussion on the mailing list before deciding the solution. There might be alternative ways to solve the problem. For example, In Pulsar we already have the Ping/Pong messages in the protocol. |
Your proposal seems to make sense a lot of sense, but the naming is perhaps not optimal. When looking at the changes, this looks like an active/passive status for a cluster and having that could make it more flexible. There might be other reasons than "health" to mark a cluster passive. Let's say when there's a need to do maintenance where it is desirable that clients move to the other cluster. There has been some changes in Pulsar PIP process. We have a PIP template https://github.com/apache/pulsar/blob/master/pip/TEMPLATE.md for capturing the proposal in a markdown file. However, that could also happen later in the process, after the initial mailing list discussion. |
@yyj8 One of the challenges is making this proposal consistent with the Blue-Green deployment feature. PIP-188 #16551 (please note that this PIP was filed before we switched to use markdown files for capturing PIPs). |
@lhotari Thank you for your help. |
PIP-188 appears to be a very good solution for handling internal forwarding of traffic. It can forward the traffic written by clients to the Blue cluster in the Blue cluster to the Green cluster. However, what we hope for is that when the Blue cluster is abnormal, our client can directly complete the connection address switch to the Green cluster, instead of needing to occupy additional bandwidth and do another forwarding to the Blue cluster. |
This is a very good suggestion. After the discussion on the mailing list is completed, I will make modifications according to the PIP template and the suggested modifications discussed by everyone. |
@yyj8 Wouldn't this already be covered by the |
The idea for |
The client class Check and update the logic as follows:
Then, in the probe method, add a newly defined request logic for obtaining cluster health status.
|
+1 @yyj8 Please check "PIP-121: Pulsar cluster level auto failover on client side", doc: #13315 and impl: #13316 Instead of adding yet another solution, could you simply use PIP-121? |
@lhotari I have read "PIP-121: Pulsar cluster level auto failover on client side", only the detection of whether the IP and port can be connected has been implemented. It is a very complex matter to determine whether a cluster can truly provide production and consumption services normally, and whether it has affected business processes. So, the goal of my PIP is to manually intervene in cluster state switching when the cluster part is unavailable and affects business processes, in order to achieve client to cluster connection switching. Because it is difficult to accurately determine whether this partially abnormal scenario has affected the business program through program automation, it is likely to cause misjudgment and ultimately affect the business program. Since initiating a request to obtain cluster health status is part of cluster probing, I placed the client's probing logic in the We can consider providing an automated and accurate health detection implementation in future optimizations that can determine whether a cluster can provide services normally. Replace the current manual update of health status with automation. |
For detailed improvement instructions, please refer to issues:
#22134
Motivation
The current Java client implementation has certain flaws in automatic fault switching.
The client only establishes a TCP connection with the exposed connection address of the cluster to determine whether the cluster is available, which cannot adapt to scenarios where the cluster is partially unavailable (half dead). In this scenario, we hope to make corresponding fault switching judgments by initiating cluster health status requests to the cluster. Then within the cluster, we provide an admin management command to update the cluster's health status. To avoid this scenario, all businesses that need to connect to this cluster need to manually switch cluster connection addresses and restart applications, resulting in inconsistent link data among multiple business team due to inconsistent operation steps.
Modifications
For other detailed information, please refer to the PR code.
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository:
yyj8#8
Mail list: https://lists.apache.org/thread/kk8lbc92mtgt0hw3tl7dfw7fmpl4jwyq