Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[close #749] Fix health checking issue #748

Merged
merged 6 commits into from
May 25, 2023

Conversation

shiyuhang0
Copy link
Collaborator

@shiyuhang0 shiyuhang0 commented May 20, 2023

What problem does this PR solve?

Issue Number: close #749

pingcap/tispark#2707

Problem Description: StoreHealthyChecker fails to check the status of TiFlash

What is changed and how does it work?

Use isMppalive RPC to probe the status of TiFlash.

Check List for Tests

Manual test:

I have tested this locally with TiSpark. TiSpark will request TiFlash using client-java

  1. TiSpark will with the client-java before this PR will throw errors even if TiFlash is alive
23/05/20 15:31:12 WARN StoreHealthyChecker: fail to check TiFlash health, regrade as unhealthy
org.tikv.shade.io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 1.999882667s. [closed=[], open=[[buffered_nanos=2005712916, waiting_for_connection]]]
	at org.tikv.shade.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:287)
	at org.tikv.shade.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:268)
	at org.tikv.shade.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:175)
	at org.tikv.common.region.StoreHealthyChecker.checkTiFlashHealth(StoreHealthyChecker.java:99)
	at org.tikv.common.region.StoreHealthyChecker.checkStoreHealth(StoreHealthyChecker.java:86)
	at org.tikv.common.region.StoreHealthyChecker.run(StoreHealthyChecker.java:154)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
23/05/20 15:31:13 WARN RangeSplitter: Cannot find valid store on TiFlash(engine=tiflash)
  1. TiSpark success with the client-java build by this pr

Side effects

  • Possible performance regression, WHY: TBD
  • Increased code complexity, WHY: TBD
  • Breaking backward compatibility, WHY: TBD
  • NO side effects

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation
  • Need to be included in the release note
  • NO related changes

@codecov
Copy link

codecov bot commented May 20, 2023

Codecov Report

Patch coverage: 26.08% and project coverage change: -0.23 ⚠️

Comparison is base (8d70ed2) 38.15% compared to head (c08ea89) 37.92%.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #748      +/-   ##
============================================
- Coverage     38.15%   37.92%   -0.23%     
+ Complexity     1614     1611       -3     
============================================
  Files           278      278              
  Lines         17493    17511      +18     
  Branches       1989     1991       +2     
============================================
- Hits           6674     6641      -33     
- Misses        10149    10210      +61     
+ Partials        670      660      -10     
Impacted Files Coverage Δ
...va/org/tikv/common/region/StoreHealthyChecker.java 60.63% <13.33%> (-12.78%) ⬇️
...ain/java/org/tikv/common/region/RegionManager.java 79.76% <50.00%> (-0.35%) ⬇️
src/main/java/org/tikv/common/region/TiStore.java 60.00% <50.00%> (-1.54%) ⬇️

... and 8 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@shiyuhang0 shiyuhang0 changed the title Optimize health check Fix health checking issue May 20, 2023
@shiyuhang0 shiyuhang0 changed the title Fix health checking issue [close 749] Fix health checking issue May 20, 2023
@shiyuhang0 shiyuhang0 changed the title [close 749] Fix health checking issue [close #749] Fix health checking issue May 22, 2023
xuanyu66
xuanyu66 previously approved these changes May 22, 2023
iosmanthus
iosmanthus previously approved these changes May 24, 2023
Copy link
Contributor

@pingyu pingyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM.

Signed-off-by: shiyuhang <1136742008@qq.com>
Signed-off-by: shiyuhang <1136742008@qq.com>
Signed-off-by: shiyuhang <1136742008@qq.com>
Signed-off-by: shiyuhang <1136742008@qq.com>
Signed-off-by: shiyuhang <1136742008@qq.com>
Signed-off-by: shiyuhang <1136742008@qq.com>
@shiyuhang0 shiyuhang0 force-pushed the optimize_health_check branch from 7f844f5 to c08ea89 Compare May 25, 2023 04:32
Copy link
Contributor

@pingyu pingyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~

@zhangyangyu zhangyangyu merged commit 71676ee into tikv:master May 25, 2023
shiyuhang0 added a commit to shiyuhang0/client-java that referenced this pull request May 25, 2023
Signed-off-by: shiyuhang <1136742008@qq.com>
shiyuhang0 added a commit that referenced this pull request May 31, 2023
* optimize getregionstore logical

Signed-off-by: shiyuhang <1136742008@qq.com>

* decrease impact

Signed-off-by: shiyuhang <1136742008@qq.com>

* Update RegionManager.java

Signed-off-by: shiyuhang <1136742008@qq.com>

* [close #749] Fix health checking issue (#748)

Signed-off-by: shiyuhang <1136742008@qq.com>

* Update RegionManager.java

Signed-off-by: shiyuhang <1136742008@qq.com>

* add log

Signed-off-by: shiyuhang <1136742008@qq.com>

* change log level

Signed-off-by: shiyuhang <1136742008@qq.com>

---------

Signed-off-by: shiyuhang <1136742008@qq.com>
@shiyuhang0
Copy link
Collaborator Author

/cherry-pick release-3.3

ti-chi-bot pushed a commit to ti-chi-bot/client-java that referenced this pull request May 31, 2023
Signed-off-by: shiyuhang <1136742008@qq.com>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-3.3: #753.

ti-chi-bot pushed a commit to ti-chi-bot/client-java that referenced this pull request May 31, 2023
Signed-off-by: shiyuhang <1136742008@qq.com>
ti-chi-bot pushed a commit to ti-chi-bot/client-java that referenced this pull request May 31, 2023
Signed-off-by: shiyuhang <1136742008@qq.com>
shiyuhang0 added a commit that referenced this pull request May 31, 2023
* optimize getregionstore logical

Signed-off-by: shiyuhang <1136742008@qq.com>

* decrease impact

Signed-off-by: shiyuhang <1136742008@qq.com>

* Update RegionManager.java

Signed-off-by: shiyuhang <1136742008@qq.com>

* Fix health checking issue (#748)

Signed-off-by: shiyuhang <1136742008@qq.com>

* Update RegionManager.java

Signed-off-by: shiyuhang <1136742008@qq.com>

* add log

Signed-off-by: shiyuhang <1136742008@qq.com>

* change log level

Signed-off-by: shiyuhang <1136742008@qq.com>

---------

Signed-off-by: shiyuhang <1136742008@qq.com>
Co-authored-by: shiyuhang <1136742008@qq.com>
Co-authored-by: shi yuhang <52435083+shiyuhang0@users.noreply.github.com>
shiyuhang0 added a commit that referenced this pull request May 31, 2023
Signed-off-by: shiyuhang <1136742008@qq.com>
Co-authored-by: shi yuhang <52435083+shiyuhang0@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CheckStoreHealth does not work wtih TiFlash
6 participants