Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikv: fix "Got too many pings" GRPC error in PD-server follower #17885

Merged
merged 2 commits into from
Jun 11, 2020
Merged

tikv: fix "Got too many pings" GRPC error in PD-server follower #17885

merged 2 commits into from
Jun 11, 2020

Conversation

lysu
Copy link
Contributor

@lysu lysu commented Jun 10, 2020

What problem does this PR solve?

Issue Number: close #17870

Problem Summary:

In PD(etcd) server side, PermitWithoutStream values is false https://github.com/etcd-io/etcd/blob/master/embed/etcd.go#L641, means

and client sends ping when there are no active streams, server will send GOAWAY and close the connection.

but in TiDB client side, PermitWithoutStream values is true means

client sends keepalive pings even with no active RPCs

so, with those conflict configuration, grpc server side will count pingStrike with wrong value.

https://github.com/grpc/grpc-go/blob/master/internal/transport/http2_server.go#L711

and report "transport: Got too many pings from the client, closing the connection." error.

What is changed and how it works?

What's Changed, How it Works:

we need keep them both as true or false, but etcd write it with magic value(false)

so we only can change tidb-side to false

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Manual test
setup a cluster, some tidb node use this fix
"Got too many pings" will happen but only in node that without this patch.

Side effects

  • ?

Release note

  • Fix "Got too many pings" grpc error log in PD-server follower

This change is Reviewable

@lysu lysu added type/bugfix This PR fixes a bug. component/tikv labels Jun 10, 2020
@lysu lysu changed the title tikv: fix "Got too many pings" grpc error in pd follower tikv: fix "Got too many pings" GRPC error in PD-server follower Jun 10, 2020
@lysu lysu requested review from nolouch and coocood June 10, 2020 03:02
@lysu lysu requested a review from hicqu June 10, 2020 03:03
@codecov
Copy link

codecov bot commented Jun 10, 2020

Codecov Report

Merging #17885 into master will decrease coverage by 0.0786%.
The diff coverage is 66.6666%.

@@               Coverage Diff                @@
##             master     #17885        +/-   ##
================================================
- Coverage   79.5047%   79.4261%   -0.0787%     
================================================
  Files           524        524                
  Lines        142160     141923       -237     
================================================
- Hits         113024     112724       -300     
- Misses        20006      20064        +58     
- Partials       9130       9135         +5     

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@coocood
Copy link
Member

coocood commented Jun 10, 2020

LGTM

jackysp
jackysp previously approved these changes Jun 10, 2020
@HunDunDM
Copy link
Contributor

There are multiple PermitWithoutStream that need to be modified.

@HunDunDM
Copy link
Contributor

And in TiKV server-side, PermitWithoutStream is false too.

@lysu
Copy link
Contributor Author

lysu commented Jun 11, 2020

@HunDunDM updated = =

@github-actions github-actions bot added the sig/sql-infra SIG: SQL Infra label Jun 11, 2020
hicqu
hicqu previously approved these changes Jun 11, 2020
jackysp
jackysp previously approved these changes Jun 11, 2020
@jackysp
Copy link
Member

jackysp commented Jun 11, 2020

/merge

@sre-bot
Copy link
Contributor

sre-bot commented Jun 11, 2020

Sorry @jackysp, you don't have permission to trigger auto merge event on this branch. You are not a committer for this part

@jackysp jackysp removed the sig/sql-infra SIG: SQL Infra label Jun 11, 2020
@jackysp
Copy link
Member

jackysp commented Jun 11, 2020

/merge

@sre-bot sre-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 11, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Jun 11, 2020

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Jun 11, 2020

@lysu merge failed.

@HunDunDM
Copy link
Contributor

Maybe we can enable keepalive_permit_without_calls in tikv server. In this way, tikv client can enable PermitWithoutStream.

@lysu lysu removed the status/DNM label Jun 11, 2020
@jackysp jackysp removed the sig/sql-infra SIG: SQL Infra label Jun 11, 2020
@lysu lysu merged commit bbeb4d0 into pingcap:master Jun 11, 2020
@jackysp
Copy link
Member

jackysp commented Jun 11, 2020

/merge

@lysu lysu deleted the dev-fix-too-many-pings branch June 11, 2020 05:54
sre-bot pushed a commit to sre-bot/tidb that referenced this pull request Jun 11, 2020
Signed-off-by: sre-bot <sre-bot@pingcap.com>
@sre-bot
Copy link
Contributor

sre-bot commented Jun 11, 2020

cherry pick to release-3.0 in PR #17944

sre-bot pushed a commit to sre-bot/tidb that referenced this pull request Jun 11, 2020
Signed-off-by: sre-bot <sre-bot@pingcap.com>
@sre-bot
Copy link
Contributor

sre-bot commented Jun 11, 2020

cherry pick to release-3.1 in PR #17946

@@ -674,9 +674,8 @@ func (do *Domain) Init(ddlLease time.Duration, sysFactory func(*Domain) (pools.R
DialOptions: []grpc.DialOption{
grpc.WithBackoffMaxDelay(time.Second * 3),
grpc.WithKeepaliveParams(keepalive.ClientParameters{
Copy link
Contributor

@HunDunDM HunDunDM Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use DialKeepAliveTime and DialKeepAliveTimeout directly in clientv3.Config instead of grpc.WithKeepaliveParams. Just like kv.go.

sre-bot pushed a commit to sre-bot/tidb that referenced this pull request Jun 11, 2020
Signed-off-by: sre-bot <sre-bot@pingcap.com>
@sre-bot
Copy link
Contributor

sre-bot commented Jun 11, 2020

cherry pick to release-4.0 in PR #17947

sre-bot added a commit that referenced this pull request Jun 11, 2020
…) (#17947)

Signed-off-by: sre-bot <sre-bot@pingcap.com>

Co-authored-by: lysu <sulifx@gmail.com>
cannium added a commit to cannium/br that referenced this pull request Jun 18, 2020
By setting "PermitWithoutStream" to false, configurations are consistent
between server(pd) and client(br) side.

Already fixed in tidb: pingcap/tidb#17885
overvenus pushed a commit to pingcap/br that referenced this pull request Jun 23, 2020
By setting "PermitWithoutStream" to false, configurations are consistent
between server(pd) and client(br) side.

Already fixed in tidb: pingcap/tidb#17885
ti-srebot pushed a commit to ti-srebot/br that referenced this pull request Jun 23, 2020
By setting "PermitWithoutStream" to false, configurations are consistent
between server(pd) and client(br) side.

Already fixed in tidb: pingcap/tidb#17885
ti-srebot pushed a commit to ti-srebot/br that referenced this pull request Jun 23, 2020
By setting "PermitWithoutStream" to false, configurations are consistent
between server(pd) and client(br) side.

Already fixed in tidb: pingcap/tidb#17885
overvenus pushed a commit to pingcap/br that referenced this pull request Jun 28, 2020
By setting "PermitWithoutStream" to false, configurations are consistent
between server(pd) and client(br) side.

Already fixed in tidb: pingcap/tidb#17885

Co-authored-by: can <can@canx.me>
overvenus pushed a commit to pingcap/br that referenced this pull request Jun 28, 2020
By setting "PermitWithoutStream" to false, configurations are consistent
between server(pd) and client(br) side.

Already fixed in tidb: pingcap/tidb#17885

Co-authored-by: can <can@canx.me>
ti-srebot pushed a commit that referenced this pull request Jul 21, 2020
…) (#17946)

Signed-off-by: sre-bot <sre-bot@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/tikv status/can-merge Indicates a PR has been approved by a committer. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TiDB sends too much pings to PD follower occasionally
6 participants