Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add parallelized health check mechanism for Raft nodes #6920

Open
wants to merge 1 commit into
base: 2.x
Choose a base branch
from

Conversation

SAIVARDHAN15
Copy link

  • I have registered the PR changes.

Ⅰ. Describe what this PR did

  • Implemented a isNodeAlive method to check the health of each node via HTTP requests.
  • Added performHealthCheck method to perform parallel health checks on a list of nodes using Java Streams.
  • Integrated the health check into the refreshAliveLookup method for Raft mode.
  • Optimized the process by filtering out the leader node from the alive nodes list.
  • Improved node reliability by ensuring unresponsive or down nodes are detected and filtered efficiently.

Ⅱ. Does this pull request fix one issue?

This PR adds a new health check feature for Raft nodes but does not fix any pre-existing issues.

Ⅳ. Describe how to verify it

Deploy a Raft cluster with Seata.
Simulate node failures or unresponsive states.
Trigger refreshAliveLookup and confirm unhealthy nodes are excluded.
Check logs for correct identification of down nodes.
Ensure leader node is retained if healthy.

@funky-eyes funky-eyes added first-time contributor first-time contributor module/discovery discovery module labels Oct 13, 2024
Copy link

codecov bot commented Oct 13, 2024

Codecov Report

Attention: Patch coverage is 0% with 14 lines in your changes missing coverage. Please review.

Project coverage is 51.77%. Comparing base (66bd3a4) to head (c28c6b5).

Files with missing lines Patch % Lines
...scovery/registry/raft/RaftRegistryServiceImpl.java 0.00% 14 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##                2.x    #6920      +/-   ##
============================================
- Coverage     52.52%   51.77%   -0.76%     
+ Complexity     6542     6459      -83     
============================================
  Files          1122     1122              
  Lines         39857    39867      +10     
  Branches       4668     4670       +2     
============================================
- Hits          20936    20641     -295     
- Misses        16925    17264     +339     
+ Partials       1996     1962      -34     
Files with missing lines Coverage Δ
...scovery/registry/raft/RaftRegistryServiceImpl.java 16.94% <0.00%> (-0.75%) ⬇️

... and 31 files with indirect coverage changes

@@ -336,22 +338,41 @@ private static boolean watch() throws RetryableException {
}

@Override
public List<InetSocketAddress> refreshAliveLookup(String transactionServiceGroup,
List<InetSocketAddress> aliveAddress) {
public List<InetSocketAddress> refreshAliveLookup(String transactionServiceGroup, List<InetSocketAddress> aliveAddress) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SAIVARDHAN15 , when refreshAliveLookup is called, it has already undergone a health check by NettyClientChannelManager#doReconnect. I tested it locally and confirmed that the health check is normal, so this task might be a misunderstanding? Due to the issues from #6919 and #6917, I have been mistakenly thinking that the health check was failing. If you're interested, you can take on these two tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
first-time contributor first-time contributor module/discovery discovery module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants