Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TSO Client and provide general client side service discovery framework and general gPRC stream handling framework. #6037

Merged
merged 15 commits into from
Feb 27, 2023

Conversation

binshi-bing
Copy link
Contributor

@binshi-bing binshi-bing commented Feb 22, 2023

What problem does this PR solve?

Refactor pd client with template design pattern to provide general client side service discovery framework and general gPRC stream batching, forwarding, async and pooling framework.

Issue Number: Ref #5836

What is changed and how does it work?

Changes:
1. Define the interface BaseClient which is generally for service discovery on a quorum-based cluster or a primary/secondy configured cluster so that the grpc client logic layer can decouple from the server discovery layer.
2. Rename baseClient to pdBaseClient then refactor it to implements BaseClient interface. It provides a basic implementation of service discovery on a quorum-based cluster.
3. Refactor pd client with template design patter to provide general gPRC stream batching, forwarding, async and pooling framework.
4. Add skeleton of tsoBaseClient which is a basic implementation of server discover on a primary/secondary configured cluster.

Check List

Tests

  • Unit test
  • Manual test

pd-tso-bench testing result:
~/code/pingcap/my-pd/bin   tso-integration ●  ./pd-tso-bench -v -duration 5s -pd "127.0.0.1:3379"  1 ↵  4577  17:46:06
Start benchmark #0, duration: 5s
Create 1 client(s) for benchmark
[2023/02/23 17:46:10.110 -08:00] [INFO] [client.go:431] ["[pd] create pd(tso) client with endpoints"] [pd-address="[127.0.0.1:3379]"]
[2023/02/23 17:46:10.110 -08:00] [INFO] [tso_client.go:336] ["[tso] switch primary"] [new-primary=http://127.0.0.1:3379/] [old-primary=]
[2023/02/23 17:46:10.110 -08:00] [INFO] [client.go:762] ["[pd] tso dispatcher created"] [dc-location=global]
count: 495783, max: 4.3819ms, min: 0.1444ms, avg: 2.0098ms
<1ms: 10410, >1ms: 228832, >2ms: 256541, >5ms: 0, >10ms: 0, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 481110, max: 4.6088ms, min: 0.0558ms, avg: 2.0063ms
<1ms: 24825, >1ms: 165334, >2ms: 290951, >5ms: 0, >10ms: 0, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 479289, max: 4.3606ms, min: 0.3493ms, avg: 2.0835ms
<1ms: 515, >1ms: 190880, >2ms: 287894, >5ms: 0, >10ms: 0, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 478138, max: 4.4927ms, min: 0.3457ms, avg: 2.0902ms
<1ms: 342, >1ms: 185102, >2ms: 292694, >5ms: 0, >10ms: 0, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
Total:
count: 1934320, max: 4.6088ms, min: 0.0558ms, avg: 2.0471ms
<1ms: 36092, >1ms: 770148, >2ms: 1128080, >5ms: 0, >10ms: 0, >30ms: 0, >50ms: 0, >100ms: 0, >200ms: 0, >400ms: 0, >800ms: 0, >1s: 0
count: 1934320, <1ms: 1.87%, >1ms: 39.81%, >2ms: 58.32%, >5ms: 0.00%, >10ms: 0.00%, >30ms: 0.00%, >50ms: 0.00%, >100ms: 0.00%, >200ms: 0.00%, >400ms: 0.00%, >800ms: 0.00%, >1s: 0.00%
P0.5: 2.0442ms, P0.8: 2.2536ms, P0.9: 2.4216ms, P0.99: 3.1541ms

Remaining issues:
Killing etcd (remote)backend and the primary in sequence, the secondary doesn't switch to primary. This is because leadership.Watch() in the secondary is stuck. This seems to be a bug in monolithic architecture but uncovered by microservice (detached etcd) scenario.
pd(tso)-client exit isn't clean.

Release note

None.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Feb 22, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • lhy1024
  • rleungx

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot
Copy link
Member

Hi @binshi-bing. Thanks for your PR.

I'm waiting for a tikv member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment on lines 63 to 70
// GetTSOAllocators returns {dc-location -> TSO allocator leader URL} connection map
GetTSOAllocators() *sync.Map
// GetTSOAllocatorLeaderAddrByDCLocation returns the tso allocator of the given dcLocation
GetTSOAllocatorLeaderAddrByDCLocation(dcLocation string) (string, bool)
// GetTSOAllocatorClientConnByDCLocation returns the tso allocator grpc client connection
// of the given dcLocation
GetTSOAllocatorClientConnByDCLocation(dcLocation string) (*grpc.ClientConn, string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a base client interface, could we only provide some general methods to build up the specified client upon it? Introducing TSO-related methods here does not sound good to me.

Copy link
Contributor Author

@binshi-bing binshi-bing Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move TSO-related methods out, but we can do it incrementally. The reason why we have TSO-related methods here is that previous baseClient(pdBaseClient) mixed up service discovery and tso related service discovery logic. There is historical reason, now we have to refactor it step by step. @JmPotato

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if the final goal is like that, then this makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I'll move CreateTsoStream, ProcessTSORequests and the related private functions to another interface. @JmPotato

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to tso_stream. Template, Abstract Factor, Builder design patterns are used to make it possible.

@binshi-bing binshi-bing changed the title tso client impl (part 1) Refactor pd client with template design pattern to provide general client side service discovery framework and general gPRC stream batching, forwarding, async and pooling framework. Feb 22, 2023
@binshi-bing binshi-bing changed the title Refactor pd client with template design pattern to provide general client side service discovery framework and general gPRC stream batching, forwarding, async and pooling framework. Provide general client side service discovery framework and general gPRC stream handling framework. Feb 22, 2023
client/base_client.go Show resolved Hide resolved
client/tso_client.go Outdated Show resolved Hide resolved
client/tso_client.go Outdated Show resolved Hide resolved
client/base_client.go Show resolved Hide resolved
@binshi-bing binshi-bing force-pushed the tso-client-impl branch 5 times, most recently from 0f9c7a6 to 764c81c Compare February 24, 2023 03:13
@binshi-bing binshi-bing changed the title Provide general client side service discovery framework and general gPRC stream handling framework. TSO Client implement and provide general client side service discovery framework and general gPRC stream handling framework. Feb 24, 2023
@codecov
Copy link

codecov bot commented Feb 25, 2023

Codecov Report

Base: 74.20% // Head: 74.00% // Decreases project coverage by -0.21% ⚠️

Coverage data is based on head (b7005e2) compared to base (8b82aa9).
Patch coverage: 67.50% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6037      +/-   ##
==========================================
- Coverage   74.20%   74.00%   -0.21%     
==========================================
  Files         373      376       +3     
  Lines       37354    37501     +147     
==========================================
+ Hits        27718    27751      +33     
- Misses       7197     7314     +117     
+ Partials     2439     2436       -3     
Flag Coverage Δ
unittests 74.00% <67.50%> (-0.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
client/tso_client.go 0.00% <0.00%> (ø)
client/resource_manager_client.go 66.84% <25.00%> (ø)
client/tso_stream.go 47.88% <47.88%> (ø)
client/keyspace_client.go 63.63% <50.00%> (ø)
client/tso_batch_controller.go 59.25% <59.25%> (ø)
client/grpcutil/grpcutil.go 73.33% <64.70%> (-11.29%) ⬇️
client/base_client.go 85.10% <82.05%> (+1.92%) ⬆️
client/tso_request_dispatcher.go 82.18% <82.18%> (ø)
client/client.go 63.67% <82.25%> (-7.09%) ⬇️
pkg/utils/metricutil/metricutil.go 62.06% <0.00%> (-20.69%) ⬇️
... and 21 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@binshi-bing binshi-bing force-pushed the tso-client-impl branch 2 times, most recently from 353b48e to 06f60fc Compare February 26, 2023 23:21
Copy link
Member

@rleungx rleungx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly, LGTM.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 27, 2023
@binshi-bing binshi-bing force-pushed the tso-client-impl branch 2 times, most recently from 3f2323c to f0580b8 Compare February 27, 2023 03:53
client/grpcutil/grpcutil.go Outdated Show resolved Hide resolved
client/base_client.go Outdated Show resolved Hide resolved
client/base_client.go Outdated Show resolved Hide resolved
client/base_client.go Outdated Show resolved Hide resolved
@binshi-bing binshi-bing force-pushed the tso-client-impl branch 2 times, most recently from 9eff40c to 8a70da1 Compare February 27, 2023 05:59
Copy link
Contributor

@lhy1024 lhy1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff so much, we need to review it carefully

if ok {
return conn.(*grpc.ClientConn), nil
}
tc, err := tlsCfg.ToTLSConfig()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tc is usually used to be test cluster in this repo, I think tls is a better name for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tls is the package name and mightn't be appropriate for the config name. tlsCfg is already used for tlsCfg *tlsutil.TLSConfig passed to this function as func param. I changed the name to tlsConfig.

client/base_client.go Show resolved Hide resolved
client/tso_client.go Show resolved Hide resolved
client/tso_client.go Show resolved Hide resolved
client/tso_client.go Show resolved Hide resolved
for {
// the pd/allocator leader change, we need to re-establish the stream
if u != url {
log.Info("[pd] the leader of the allocator leader is changed", zap.String("dc", dc), zap.String("origin", url), zap.String("new", u))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also [pd]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are used by both tso APIs in pd client and tso APIs in the independent tso client. I changed name to [pd/tso]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also changed every [pd] to [pd/tso] in this file.

client/tso_request_dispatcher.go Outdated Show resolved Hide resolved
stream tsopb.TSO_TsoClient
}

func (s *tsoTSOStream) processRequests(clusterID uint64, dcLocation string, requests []*tsoRequest,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this two function is so similiar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not easy to remove duplicate code now, because the request in stream.Send(), the stream, the response from stream.Recv() are in different types from pdpb and tsopb. We have used several design patterns, including template, Abstract Factory, Builder as you can see in base_client.go and tso_stream.go, so that we can share the service discovery code and tso batching/forwarding/async framework. These two functions are closest to the protobuf code and only different in types so it is even not possible to dedup further.

client/tso_request_dispatcher.go Show resolved Hide resolved
client/client.go Outdated
go c.tsLoop()
go c.tsCancelLoop()
}
if enableAdmissionCtl {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tso mcs mode doesn't need to enable admisstion control module, so this flag is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we can have a better name, such as enableAdmissionControl?

Copy link
Contributor Author

@binshi-bing binshi-bing Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the flags for now. The original purpose of adding these flags is to disable some clients when the server side doesn't provide the correspoinding services. For example, disable tso client when the server side is resource management microservice, or disable resource management client when the server side the tso microservice.

client/client.go Outdated Show resolved Hide resolved
Changes:
1. Define the interface BaseClient which is generally for service discovery on a quorum-based cluster or a primary/secondy configured cluster so that the grpc client logic layer can decouple from the server discovery layer.
2. Rename baseClient to pdBaseClient then refactor it to implements BaseClient interface. It provides a basic implementation of service discovery on a quorum-based cluster.
3. Refactor pd client with template design patter to provides general client side service discovery framework and general TSO batching, forwarding, async and pooling framework.
4. Add skeleton of tsoBaseClient which is a basic implementation of server discover on a primary/secondary configured cluster.

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
If -m is specified, then talk to tso microservice, pd otherwise.
Set -tso to specify tso serving addresses.

e.g., ./pd-tso-bench -v -m -duration 5s -tso "127.0.0.1:3379"

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
AddTSOAllocatorServiceEndpointSwitchedCallback adds callbacks which will be called when any global/local tso allocator service endpoint is switched.

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
…adability of client.go

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
…t_dispatcher.go to improve code readability.

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
…nd move GetOrCreateGRPCConn to grpcutil.

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
…r refactor the code.

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
… client.

Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Copy link
Contributor

@lhy1024 lhy1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 27, 2023
@lhy1024
Copy link
Contributor

lhy1024 commented Feb 27, 2023

/merge

@ti-chi-bot
Copy link
Member

@lhy1024: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 3ca0398

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 27, 2023
Copy link
Contributor

@nolouch nolouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks awesome!

@nolouch
Copy link
Contributor

nolouch commented Feb 27, 2023

/ok-to-test

@ti-chi-bot ti-chi-bot merged commit 6be15a5 into tikv:master Feb 27, 2023
@binshi-bing binshi-bing changed the title TSO Client implement and provide general client side service discovery framework and general gPRC stream handling framework. TSO Client implementation and provide general client side service discovery framework and general gPRC stream handling framework. Mar 14, 2023
@binshi-bing binshi-bing changed the title TSO Client implementation and provide general client side service discovery framework and general gPRC stream handling framework. Implement TSO Client and provide general client side service discovery framework and general gPRC stream handling framework. Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test release-note-none status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants