-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide GetMinTS API to solve the compatibility issue brought by multi-timeline tso #6421
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
88a3be6
to
45f9469
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #6421 +/- ##
==========================================
+ Coverage 75.03% 75.08% +0.04%
==========================================
Files 409 410 +1
Lines 41532 41631 +99
==========================================
+ Hits 31165 31258 +93
- Misses 7659 7660 +1
- Partials 2708 2713 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
// If the service mode is switched to API during GetTS() call, which happens during migration, | ||
// returning the default timeline should be fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to consider local ts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GetTS has passed DCLocation to GetMinTS rpc to cover local ts. It keeps back compatibility with PD, but TSO microservice doesn't support local ts.
for _, addr := range addrs { | ||
go func(addr string) { | ||
defer wg.Done() | ||
resp, err := c.getMinTSFromSingleServer(ctx, addr, c.option.timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is no primary in a tso server, do we need to wait to result for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will always wait all tso server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, but it does need to wait all keyspace groups in serving status. It needs to meet the following conditions.
// Check the results. The returned minimal timestamp is valid if all the conditions are met:
// 1. The number of responses is equal to the number of TSO servers/pods.
// 2. The number of keyspace groups asked is equal to the number of TSO servers/pods.
// 3. The minimal timestamp is not zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have discussed two more options for further optimization which might be overkilled for now. The two more options we discussed are:
- Read all saved timestamps of all keyspace groups from etcd. Select the minimal ts then substract a timewindow. The cons is that it could cause stuck in availability when loading all saved timestamps in a transaction. If QPS is higher than frequency of lead start/restart, it could impact availability.
- Based on the current approach, but every keyspace group cache the last ts. If the update time of the last ts, for a keyspace group, is out of a time window, it needs to query the timestamp to pull forward the timeline. It might be overkill for now, because we won't have too many timelines from the beginning and the QPS of GetMinTS is very low (few times per day?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest LGTM.
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
Signed-off-by: Bin Shi <binshi.bing@gmail.com>
/merge |
@rleungx: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 2ee495d
|
* client: refine serviceModeKeeper code (tikv#6201) ref tikv#5895 Some code refinements for `serviceModeKeeper`. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * *: use revision for watch test (tikv#6205) ref tikv#6071 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * *: remove unnecessary rand init (tikv#6207) close tikv#6134 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * Refactor TSO forward/dispatcher to be shared by both PD and TSO (tikv#6175) ref tikv#5895 Add general tso forward/dispatcher for independent pd(tso)/tso services and cross cluster forwarding. Signed-off-by: Bin Shi <binshi.bing@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * Add basic multi-keyspace-group management (tikv#6214) ref tikv#5895 Support basic functions of multi-keyspace-group management Signed-off-by: Bin Shi <binshi.bing@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * *: support keyspace group RESTful API (tikv#6229) ref tikv#6231 Signed-off-by: Ryan Leung <rleungx@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * mcs: add more tso tests (tikv#6184) ref tikv#5836 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * client: fix compatibility problem of pd client (tikv#6244) close tikv#6243 Signed-off-by: Ryan Leung <rleungx@gmail.com> * *: unify the key prefix (tikv#6248) ref tikv#5836 Signed-off-by: Ryan Leung <rleungx@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * *: remove cluster dependency from keyspace (tikv#6249) ref tikv#6231 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * *: make code clear by rename `isServing` to `isRunning` (tikv#6258) ref tikv#4399 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * cgroup: fix the path problem due to special container name (tikv#6267) close tikv#6266 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * server: fix watch keyspace revision (tikv#6251) ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * tso, server: refine the TSO allocator manager parameters (tikv#6269) ref tikv#5895 - Refine the TSO allocator manager parameters. - Always run `tsoAllocatorLoop` to advance the Global TSO. Signed-off-by: JmPotato <ghzpotato@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * tso: unify the TSO ServiceConfig and ConfigProvider interfaces (tikv#6272) ref tikv#5895 Unify the TSO `ServiceConfig` and `ConfigProvider` interfaces. Signed-off-by: JmPotato <ghzpotato@gmail.com> * Load initial assignment and dynamically watch/apply keyspace groups' membership/distribution change (tikv#6247) ref tikv#6232 Load initial keyspace group assignment. Dynamically watch/apply keyspace groups' membership/distribution change. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * *: define user kind for keyspace group (tikv#6241) ref tikv#6231 Signed-off-by: Ryan Leung <rleungx@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * Add more failure tests when tso service loading initial keyspace groups assignment (tikv#6280) ref tikv#6232 Add more failure tests when tso service loading initial keyspace groups assignment Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Apply multi-keyspace-group membership to tso service and handle inconsistency issue (tikv#6282) ref tikv#6232 Apply multi-keyspace-group membership to tso service and handle inconsistency issue. 1. Add KeyspaceLookupTable to endpoint.KeyspaceGroup type KeyspaceGroup struct { ... // KeyspaceLookupTable is for fast lookup if a given keyspace belongs to this keyspace group. // It's not persisted and will be built when loading from storage. KeyspaceLookupTable map[uint32]struct{} `json:"-"` } 2. After loading keyspace groups, the Keyspace Group Manager builds KeyspaceLookupTable for every keyspace groups. 3. When Keyspace Group Manager handles tso requests, it uses the keyspaceLookupTable to check if the required keypsace still belongs to the required keyspace group. If not, returns the current keyspace group id in the tso response header. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * *: auto assign keyspace group (tikv#6268) close tikv#6231 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * keyspace, api: support the keyspace group split (tikv#6293) ref tikv#6232 Support the keyspace group split and add related tests. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * Improve lock mechanism in tso.KeyspaceGroupManager (tikv#6305) ref tikv#6232 Use the RWMutex instead of individual atomic types to better protect the state of the keyspace group manager Signed-off-by: Bin Shi <binshi.bing@gmail.com> * keyspace: add split-from field for endpoint.KeyspaceGroup (tikv#6309) ref tikv#6232 Add `split-from` field for `endpoint.KeyspaceGroup`. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * Add read lock at one place for protection and better structure (tikv#6310) ref tikv#6232, ref tikv#6305 follow-up tikv#6305 Add read lock at one place for protection and better structure Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * tso: optimize function signatures to reduce parameter passing (tikv#6315) ref tikv#6232 Optimize function signatures to reduce parameter passing. Signed-off-by: JmPotato <ghzpotato@gmail.com> * *: bootstrap keyspace group when server is in API mode (tikv#6308) ref tikv#6231 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * keyspace: avoid keyspace being updated during the split (tikv#6316) ref tikv#6232 Prevent keyspace from being updated during the split. Signed-off-by: JmPotato <ghzpotato@gmail.com> * *: fix `TestConcurrentlyReset` (tikv#6318) close tikv#6275 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * bootstrap default keyspace group in the tso service (tikv#6306) ref tikv#6232 Changes: 1. Introduce the initialization logic of the default keyspace group. - If the default keyspace group isn't configured in the etcd, every tso node/pod should initialize it and join the election for the primary of this group. - If the default keyspace group is configured in the etcd, the tso nodes/pods which are assigned with this group will initialize it and join the election for the primary of this group. 2. Introduce the keyspace group membership restriction -- default keyspace always belongs to default keyspace group. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * *: change the log level (tikv#6324) ref tikv#6232 Signed-off-by: Ryan Leung <rleungx@gmail.com> * *: fix the missing log panic (tikv#6325) close tikv#6257 Signed-off-by: Ryan Leung <rleungx@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * mcs: fix watch primary address revision and update cache when meets not leader (tikv#6279) ref tikv#5895 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * tso, member: support TSO split based on keyspace group split (tikv#6313) ref tikv#6232 Support TSO split based on keyspace group split. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * mcs: support metrics HTTP interface for tso/resource manager server (tikv#6329) ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * tso: put finishSplitKeyspaceGroup into the critical section (tikv#6331) ref tikv#6232 Put `finishSplitKeyspaceGroup` into the critical section. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * *: make `TestServerRegister` stable (tikv#6337) close tikv#6334 Signed-off-by: Ryan Leung <rleungx@gmail.com> * tests: divide all tests into the CI chunks including submodule tests (tikv#6198) ref tikv#6181, ref tikv#6183 Divide all tests into the CI chunks including submodule tests. Signed-off-by: JmPotato <ghzpotato@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * tests: introduce TSO TestCluster in the test (tikv#6333) ref tikv#6232 Introduce TSO `TestCluster` in the test. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * mcs: add balancer for keyspace group (tikv#6274) close tikv#6233 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * Fixed bugs in tso service registry watching loop. (tikv#6346) ref tikv#6343 Fixed the following two bugs: 1. When re-watch a range, to continue from what left by the last watch, the revision is wresp.Header.Revision + 1 instead of wresp.Header.Revision, where wresp.Header.Revision is the revision indicated in the response of the last watch. Because of this bug, it was processing the same event endless as you can see from the log below. 2. In tso service watch loop in /Users/binshi/code/pingcap/my-pd/pkg/keyspace/tso_keyspace_group.go, If this is delete event, the json.Unmarshal(event.Kv.Value, s) will fail with the error "unexpected end of JSON input", so there is no way to get s.serviceAddr from the result of json.Unmarshal. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs: fix double compression of prom handler (tikv#6339) ref prometheus/client_golang#622, ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> * tests, tso: add more TSO split tests (tikv#6338) ref tikv#6232 Add more TSO split tests. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * keyspace, tso: fix next revision to watch after watch/Get/RangeScan (tikv#6353) ref tikv#6232 The next revision to watch should always be Header.Revision + 1 where header is response header of watch/Get/RangeScan Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tests: use TSO cluster to do the failover test (tikv#6356) ref tikv#5895 Use TSO cluster to do the failover test. Signed-off-by: JmPotato <ghzpotato@gmail.com> * fix startWatchLoop leak (tikv#6352) Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: update client when meet transport is closing (tikv#6341) * mcs: update client when meet transport is closing Signed-off-by: lhy1024 <admin@liudos.us> * address comments Signed-off-by: lhy1024 <admin@liudos.us> * add retry Signed-off-by: lhy1024 <admin@liudos.us> --------- Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <admin@liudos.us> * add bootstrap test (tikv#6347) Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <admin@liudos.us> * Fix flaky TestLoadKeyspaceGroupsAssignment test (tikv#6365) Reduce the count of keyspace groups from 4096 to 512 in TestKeyspaceGroupManagerTestSuite/TestLoadKeyspaceGroupsAssignment to avoid timeout when test running slow. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tso: fix ts fallback caused by multi-primary of the same keyspace group (tikv#6362) * Change participant election-prifix from listen-addr to advertise-listen-addr to gurantee uniqueness. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Add TestPariticipantStartWithAdvertiseListenAddr Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Add comments to fix go fmt errors Signed-off-by: Bin Shi <binshi.bing@gmail.com> --------- Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: Ryan Leung <rleungx@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * fix log output (tikv#6364) Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: fix duplicate start of RaftCluster. (tikv#6358) * Using double-checked locking to avoid duplicate start of RaftCluster. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Handle feedback Signed-off-by: Bin Shi <binshi.bing@gmail.com> * improve locking Signed-off-by: Bin Shi <binshi.bing@gmail.com> * handle feedback Signed-off-by: Bin Shi <binshi.bing@gmail.com> --------- Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: Ryan Leung <rleungx@gmail.com> * Add retry mechanism for updating keyspace group (tikv#6372) Signed-off-by: JmPotato <ghzpotato@gmail.com> * mcs: add set handler for balancer and alloc node for default keyspace group (tikv#6342) ref tikv#6233 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs, tso: fix Nil pointer deference when (*AllocatorManager).GetMember (tikv#6383) close tikv#6381 If the desired keyspace group fall back to the default keyspace group and the AM isn't initialized, return not served error. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tso: support multi-keyspace-group and its service discovery in E2E path (tikv#6321) ref tikv#6232 Support multi-keyspace-group in PD(TSO) client Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <admin@liudos.us> * client: add `NewClientWithKeyspaceName` for client (tikv#6380) ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> * keyspace, tso: check the replica count before the split (tikv#6382) ref tikv#6233 Check the replica count before the split. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tso: fix bugs to make split test case to pass (tikv#6389) ref tikv#6232 fix bugs to make split test case to pass Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * keyspace: patrol keyspace assignment before the first split (tikv#6388) ref tikv#6232 Patrol the keyspace assignment before the first split to make sure every keyspace has its group assignment. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tests: fix Flaky TestMicroserviceTSOServer/TestConcurrentlyReset (tikv#6396) close tikv#6385 Get a copy of now then call base.add, because now is shared by all goroutines and now.add() will add to itself which isn't atomic and multi-goroutine safe. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * keyspace, slice: improve code efficiency in membership ops (tikv#6392) ref tikv#6231 Improve code efficiency in membership ops Signed-off-by: Bin Shi <binshi.bing@gmail.com> * tests: enable TestTSOKeyspaceGroupSplitClient (tikv#6398) ref tikv#6232 Enable `TestTSOKeyspaceGroupSplitClient`. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tests: add more tests for multiple keyspace groups (tikv#6395) ref tikv#5895 Add CheckMultiKeyspacesTSO() and WaitForMultiKeyspacesTSOAvailable in test utility. Add TestTSOKeyspaceGroupManager/TestKeyspacesServedByNonDefaultKeyspaceGroup. Cover TestGetTS, TestGetTSAsync, TestUpdateAfterResetTSO in TestMicroserviceTSOClient for multiple keyspace groups. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * tests: fix failpoint disable (tikv#6401) ref tikv#4399 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <admin@liudos.us> * client: retry load keyspace meta when creating a new client (tikv#6402) ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Fix test issue in TestRandomResignLeader. (tikv#6410) close tikv#6404 We need to make sure the selected keyspaces are from different keyspace groups, otherwise multiple goroutines below could try to resign the primary of the same keyspace group and cause race condition. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * keyspace, api2: fix the keyspace assignment patrol consistency (tikv#6397) ref tikv#6232 Fix the keyspace assignment patrol consistency. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ryan Leung <rleungx@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * election, tso: fix data race in lease.go (tikv#6379) close tikv#6378 fix data race in lease.go Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: fix forward test with pd mode client (tikv#6290) ref tikv#5895, ref tikv#6279, close tikv#6289 Signed-off-by: lhy1024 <admin@liudos.us> * keyspace: patrol the keyspace assignment in batch (tikv#6411) ref tikv#6232 Patrol the keyspace assignment in batch. Signed-off-by: JmPotato <ghzpotato@gmail.com> * etcdutil: add watch loop (tikv#6390) close tikv#6391 Signed-off-by: lhy1024 <admin@liudos.us> * mcs, tso: add API interface to obtain the TSO keyspace group member info (tikv#6373) ref tikv#6232 Add API interface to obtain the TSO keyspace group member info. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * pkg: move operator_check out from test_util tikv#6162 Signed-off-by: lhy1024 <admin@liudos.us> * keysapce: wait region split when creating keyspace (tikv#6414) ref tikv#6231 Signed-off-by: zeminzhou <zhouzemin@pingcap.com> Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: zzm <zhouzemin@pingcap.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <admin@liudos.us> * mcs: use getClusterInfo to check whether api service is ready (tikv#6422) ref tikv#5836 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <admin@liudos.us> * fix data race by replace clone (tikv#6242) close tikv#6230 Signed-off-by: bufferflies <1045931706@qq.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Signed-off-by: lhy1024 <admin@liudos.us> * fix test and git mod tidy Signed-off-by: lhy1024 <admin@liudos.us> * revert makefile Signed-off-by: lhy1024 <admin@liudos.us> * fix ctx in watch loop Signed-off-by: lhy1024 <admin@liudos.us> * delete pd-tests.yaml Signed-off-by: lhy1024 <admin@liudos.us> * pd-ctl, tests: add the keyspace group commands (tikv#6423) ref tikv#6232 Add the keyspace group commands to show and split keyspace groups. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Handle compatibility issue in GetClusterInfo RPC (tikv#6434) ref tikv#5895, close tikv#6448 Handle the compatibility issue in the GetClusterInfo RPC Signed-off-by: Bin Shi <binshi.bing@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * Provide GetMinTS API to solve the compatibility issue brought by multi-timeline tso (tikv#6421) ref tikv#6142 1. Import kvproto change to introduce GetMinTS rpc in the TSO service. 6. Add server side implementation for GetMinTS rpc. 7. Add client side implementation for GetMinTS rpc. 8. Add unit test Signed-off-by: Bin Shi <binshi.bing@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * tso: use less interval when waiting api service (tikv#6451) close tikv#6449 Signed-off-by: lhy1024 <admin@liudos.us> * etcdutil: fix ctx in watch loop (tikv#6445) close tikv#6439 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Signed-off-by: lhy1024 <admin@liudos.us> * Fix "non-default keyspace groups use the same timestamp path by mistake" (tikv#6457) close tikv#6453, close tikv#6465 The tso servers are loading keyspace groups asynchronously. Make sure all keyspace groups are available for serving tso requests from corresponding keyspaces by querying IsKeyspaceServing(keyspaceID, the Desired KeyspaceGroupID). if use default keyspace group id in the query, it will always return true as the keyspace will be served by default keyspace group before the keyspace groups are loaded. Signed-off-by: Bin Shi <binshi.bing@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> * TSO microservice discovery fallback path shouldn't call FindGroupByKeyspaceID (tikv#6473) close tikv#6472 TSO microservice discovery fallback path shouldn't call FindGroupByKeyspaceID Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Revert "cgroup: fix the path problem due to special container name (tikv#6267)" This reverts commit 0c4cf7f947799e5c45d6e37448475b921044bdde. * *: rm debug file (tikv#6458) ref tikv#4399 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Revert "*: remove unnecessary rand init (tikv#6207)" This reverts commit 7383ded7581c417a3866da271eb2ec0a27b5a6c8. * mcs, tso: handle null keyspace (tikv#6476) ref tikv#5895 For API V1 and legacy path (NewClientWithContext w/o keyspace id/name), using Null Keypsace ID (uint32max) instead of default keyspace id and make sure it can be served by the default keyspace group's timeline. Modifying test accordingly. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tso: print TSO service discovery fallback log just once (tikv#6478) ref tikv#5895 Print TSO service discovery fallback log just once Signed-off-by: Bin Shi <binshi.bing@gmail.com> * client: return error if the keyspace meta cannot be found (tikv#6479) ref tikv#6142 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * client: support use API context to create client (tikv#6482) ref tikv#6142 Signed-off-by: Ryan Leung <rleungx@gmail.com> --------- Signed-off-by: Bin Shi <binshi.bing@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> Signed-off-by: Ryan Leung <rleungx@gmail.com> Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Co-authored-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Bin Shi <39923490+binshi-bing@users.noreply.github.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Co-authored-by: zzm <zhouzemin@pingcap.com> Co-authored-by: buffer <1045931706@qq.com>
* Fixed bugs in tso service registry watching loop. (tikv#6346) ref tikv#6343 Fixed the following two bugs: 1. When re-watch a range, to continue from what left by the last watch, the revision is wresp.Header.Revision + 1 instead of wresp.Header.Revision, where wresp.Header.Revision is the revision indicated in the response of the last watch. Because of this bug, it was processing the same event endless as you can see from the log below. 2. In tso service watch loop in /Users/binshi/code/pingcap/my-pd/pkg/keyspace/tso_keyspace_group.go, If this is delete event, the json.Unmarshal(event.Kv.Value, s) will fail with the error "unexpected end of JSON input", so there is no way to get s.serviceAddr from the result of json.Unmarshal. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs: fix double compression of prom handler (tikv#6339) ref prometheus/client_golang#622, ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> * tests, tso: add more TSO split tests (tikv#6338) ref tikv#6232 Add more TSO split tests. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> * keyspace, tso: fix next revision to watch after watch/Get/RangeScan (tikv#6353) ref tikv#6232 The next revision to watch should always be Header.Revision + 1 where header is response header of watch/Get/RangeScan Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tests: use TSO cluster to do the failover test (tikv#6356) ref tikv#5895 Use TSO cluster to do the failover test. Signed-off-by: JmPotato <ghzpotato@gmail.com> * fix startWatchLoop leak (tikv#6352) Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: update client when meet transport is closing (tikv#6341) * mcs: update client when meet transport is closing Signed-off-by: lhy1024 <admin@liudos.us> * address comments Signed-off-by: lhy1024 <admin@liudos.us> * add retry Signed-off-by: lhy1024 <admin@liudos.us> --------- Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * add bootstrap test (tikv#6347) Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs, tso: fix ts fallback caused by multi-primary of the same keyspace group (tikv#6362) * Change participant election-prifix from listen-addr to advertise-listen-addr to gurantee uniqueness. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Add TestPariticipantStartWithAdvertiseListenAddr Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Add comments to fix go fmt errors Signed-off-by: Bin Shi <binshi.bing@gmail.com> --------- Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: Ryan Leung <rleungx@gmail.com> * fix log output (tikv#6364) Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * fix data race issue in the store limit test (tikv#6370) * fix data race issue in th estore limit test Signed-off-by: Bin Shi <binshi.bing@gmail.com> * fix gmt error Signed-off-by: Bin Shi <binshi.bing@gmail.com> --------- Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs: fix duplicate start of RaftCluster. (tikv#6358) * Using double-checked locking to avoid duplicate start of RaftCluster. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Handle feedback Signed-off-by: Bin Shi <binshi.bing@gmail.com> * improve locking Signed-off-by: Bin Shi <binshi.bing@gmail.com> * handle feedback Signed-off-by: Bin Shi <binshi.bing@gmail.com> --------- Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: Ryan Leung <rleungx@gmail.com> * Add retry mechanism for updating keyspace group (tikv#6372) Signed-off-by: JmPotato <ghzpotato@gmail.com> * etcdutil: revert etcd client without multi endpoint (tikv#6374) ref tikv#6124 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: add set handler for balancer and alloc node for default keyspace group (tikv#6342) ref tikv#6233 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs, tso: fix Nil pointer deference when (*AllocatorManager).GetMember (tikv#6383) close tikv#6381 If the desired keyspace group fall back to the default keyspace group and the AM isn't initialized, return not served error. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tso: support multi-keyspace-group and its service discovery in E2E path (tikv#6321) ref tikv#6232 Support multi-keyspace-group in PD(TSO) client Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * client: add `NewClientWithKeyspaceName` for client (tikv#6380) ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> * keyspace, tso: check the replica count before the split (tikv#6382) ref tikv#6233 Check the replica count before the split. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tso: fix bugs to make split test case to pass (tikv#6389) ref tikv#6232 fix bugs to make split test case to pass Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * keyspace: patrol keyspace assignment before the first split (tikv#6388) ref tikv#6232 Patrol the keyspace assignment before the first split to make sure every keyspace has its group assignment. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tests: fix Flaky TestMicroserviceTSOServer/TestConcurrentlyReset (tikv#6396) close tikv#6385 Get a copy of now then call base.add, because now is shared by all goroutines and now.add() will add to itself which isn't atomic and multi-goroutine safe. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * keyspace, slice: improve code efficiency in membership ops (tikv#6392) ref tikv#6231 Improve code efficiency in membership ops Signed-off-by: Bin Shi <binshi.bing@gmail.com> * tests: enable TestTSOKeyspaceGroupSplitClient (tikv#6398) ref tikv#6232 Enable `TestTSOKeyspaceGroupSplitClient`. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tests: add more tests for multiple keyspace groups (tikv#6395) ref tikv#5895 Add CheckMultiKeyspacesTSO() and WaitForMultiKeyspacesTSOAvailable in test utility. Add TestTSOKeyspaceGroupManager/TestKeyspacesServedByNonDefaultKeyspaceGroup. Cover TestGetTS, TestGetTSAsync, TestUpdateAfterResetTSO in TestMicroserviceTSOClient for multiple keyspace groups. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * tests: fix failpoint disable (tikv#6401) ref tikv#4399 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * client: retry load keyspace meta when creating a new client (tikv#6402) ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Fix test issue in TestRandomResignLeader. (tikv#6410) close tikv#6404 We need to make sure the selected keyspaces are from different keyspace groups, otherwise multiple goroutines below could try to resign the primary of the same keyspace group and cause race condition. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * keyspace, api2: fix the keyspace assignment patrol consistency (tikv#6397) ref tikv#6232 Fix the keyspace assignment patrol consistency. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ryan Leung <rleungx@gmail.com> * election, tso: fix data race in lease.go (tikv#6379) close tikv#6378 fix data race in lease.go Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: fix forward test with pd mode client (tikv#6290) ref tikv#5895, ref tikv#6279, close tikv#6289 Signed-off-by: lhy1024 <admin@liudos.us> * keyspace: patrol the keyspace assignment in batch (tikv#6411) ref tikv#6232 Patrol the keyspace assignment in batch. Signed-off-by: JmPotato <ghzpotato@gmail.com> * etcdutil: add watch loop (tikv#6390) close tikv#6391 Signed-off-by: lhy1024 <admin@liudos.us> * mcs, tso: add API interface to obtain the TSO keyspace group member info (tikv#6373) ref tikv#6232 Add API interface to obtain the TSO keyspace group member info. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * keysapce: wait region split when creating keyspace (tikv#6414) ref tikv#6231 Signed-off-by: zeminzhou <zhouzemin@pingcap.com> Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: zzm <zhouzemin@pingcap.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: use getClusterInfo to check whether api service is ready (tikv#6422) ref tikv#5836 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * pd-ctl, tests: add the keyspace group commands (tikv#6423) ref tikv#6232 Add the keyspace group commands to show and split keyspace groups. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Handle compatibility issue in GetClusterInfo RPC (tikv#6434) ref tikv#5895, close tikv#6448 Handle the compatibility issue in the GetClusterInfo RPC Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Provide GetMinTS API to solve the compatibility issue brought by multi-timeline tso (tikv#6421) ref tikv#6142 1. Import kvproto change to introduce GetMinTS rpc in the TSO service. 6. Add server side implementation for GetMinTS rpc. 7. Add client side implementation for GetMinTS rpc. 8. Add unit test Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Disable TestGetMinTS since it's not stable (tikv#6459) ref tikv#6453 Disable TestGetMinTS due to tikv#6453 Signed-off-by: Bin Shi <binshi.bing@gmail.com> * tso: use less interval when waiting api service (tikv#6451) close tikv#6449 Signed-off-by: lhy1024 <admin@liudos.us> * etcdutil: fix ctx in watch loop (tikv#6445) close tikv#6439 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * *: fix disable failpoint (tikv#6412) ref tikv#4399 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Fix "non-default keyspace groups use the same timestamp path by mistake" (tikv#6457) close tikv#6453, close tikv#6465 The tso servers are loading keyspace groups asynchronously. Make sure all keyspace groups are available for serving tso requests from corresponding keyspaces by querying IsKeyspaceServing(keyspaceID, the Desired KeyspaceGroupID). if use default keyspace group id in the query, it will always return true as the keyspace will be served by default keyspace group before the keyspace groups are loaded. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * TSO microservice discovery fallback path shouldn't call FindGroupByKeyspaceID (tikv#6473) close tikv#6472 TSO microservice discovery fallback path shouldn't call FindGroupByKeyspaceID Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tso: handle null keyspace (tikv#6476) ref tikv#5895 For API V1 and legacy path (NewClientWithContext w/o keyspace id/name), using Null Keypsace ID (uint32max) instead of default keyspace id and make sure it can be served by the default keyspace group's timeline. Modifying test accordingly. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tso: print TSO service discovery fallback log just once (tikv#6478) ref tikv#5895 Print TSO service discovery fallback log just once Signed-off-by: Bin Shi <binshi.bing@gmail.com> * client: return error if the keyspace meta cannot be found (tikv#6479) ref tikv#6142 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * client: support use API context to create client (tikv#6482) ref tikv#6142 Signed-off-by: Ryan Leung <rleungx@gmail.com> * client: fix the leak of the keyspace watch channel (tikv#6494) ref tikv#4399 Signed-off-by: Ryan Leung <rleungx@gmail.com> * mcs, tso: implement GetMinTS gPRC on both API leader and PD client (tikv#6488) ref tikv#5895 implement GetMinTS gPRC on both API leader and the PD client. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * keyspace: add benchmarks for keyspace assignment patrol (tikv#6507) ref tikv#5895 Add benchmarks for keyspace assignment patrol. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Print keyspace-group-id zap field in the tso log conditionally (tikv#6514) ref tikv#5895 To keep the logging info in on-premises clean, we only print keyspace-group-id zap field for the non-default keyspace group id. Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Improve logging when tso keyspace group meta is updated. (tikv#6513) close tikv#6512 Improve logging when tso keyspace group meta is updated. Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tso: log tso service discovery info on the client side only when the primary is changed (tikv#6511) close tikv#6508 Skip logging of the tso service discovery info when the secondar list is changed, because tso servers currently don't have consistent view of the member list due to remote etcd being used by tso service, which results in changing member list when the client queries the tso servers in round-robin. We need to improve the server side so that all tso servers can return the global consistent view of the keyspace groups' serving or membership info. Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: fix the members field is null (tikv#6518) close tikv#6519 Signed-off-by: Ryan Leung <rleungx@gmail.com> * mcs, tso: remove unnecessary "create tso forwarding stream" log on the common happy path (tikv#6524) close tikv#6517 Remove unnecessary "create tso forwarding stream" on the common happy path Signed-off-by: Bin Shi <binshi.bing@gmail.com> * fix watch keyspace (tikv#6528) close tikv#6527 Signed-off-by: AmoebaProtozoa <8039876+AmoebaProtozoa@users.noreply.github.com> * Fix tso server close stuck issue (tikv#6529) ref tikv#5895, close tikv#6304 Rewrite TSO gPRC/HTTP server Close(). Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tso: change keyspace group primary path. (tikv#6526) ref tikv#5895 mcs, tso: change keyspace group primary path. The path for non-default keyspace group primary election changes from "/ms/{cluster_id}/tso/{group}/primary" to "/ms/{cluster_id}/tso/keyspace_groups/election/{group}/primary". Default keyspace group keeps /ms/{cluster_id}/tso/00000/primary. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * tso: fix flaky test TestHandleTSORequestWithWrongMembership (tikv#6533) close tikv#6532 Add loop to wait for primary election to complete so that the tso service availability is deterministic. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Add TestUpgradingAPIandTSOClusters (tikv#6534) ref tikv#5895 Add TestUpgradingAPIandTSOClusters to test the scenario that after we restart the API cluster then restart the TSO cluster, the TSO service can still serve TSO requests normally. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs: add a test for starting tso server first (tikv#6535) ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: support pprof for microservices (tikv#6541) ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> * mcs, tso: delete the tso ms discovery when switching away from API mode. (tikv#6544) close tikv#6543 When switching from the PD mode to the API mode, the old tso microservice discovery isn't needed anymore, and all resources can be released to avoid noisy error logs when the component trying to discover the non-existent tso microservice. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * keyspace: adjust `keyspacePatrolBatchSize` to avoid too many operations in txn request (tikv#6562) close tikv#6561 Signed-off-by: lhy1024 <admin@liudos.us> * *: fix some log issues (tikv#6568) ref tikv#5895, ref tikv#6390 Signed-off-by: Ryan Leung <rleungx@gmail.com> * Simplify TSO Proxy implementation by using one forward stream for one gPRC stream (tikv#6572) close tikv#6549, ref tikv#6565 Simplify tso proxy implementation by using one forward stream for one grpc.ServerStream. tikv#6565 is a longer term solution for both follower batching and tso microservice. It's well implemented, but just need more time to bake, and we need a short term workable solution for now. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * Improve tso proxy reliability (tikv#6585) ref tikv#5895 Improve tso proxy reliability. 1. Add protection mechanisms to TSO Proxy. a. Throttle the concurrency of TSO Proxy streamings. Default 5000. b. If TSO Proxy didn't receive the TSO request from the client for 1 hour, close the stream. 2. Optimize forceLoad lock with RW lock. 3. Enable stress test. 4. Add deadline for API leader forwarding request to TSO service. 5. Make tso response channel more safely. 6. Move tso proxy stress test away from the test suite as it has impact on other test cases. 7. Fix grpc client connection pool (server side) resource leak problem. 8. Make MaxConcurrentTSOProxyStreamings (5000 as default) and TSOProxyClientRecvTimeout (1 hour as default) configurable. 9. Add metrics tsoProxyHandleDuration, tsoProxyBatchSize and tsoProxyForwardTimeoutCounter. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * tests: make TestSplitKeyspaceGroup stable (tikv#6584) close tikv#6571 Signed-off-by: lhy1024 <admin@liudos.us> * Add keyspace and keyspace group info to the time fallback log. (tikv#6581) ref tikv#5895 Add keyspace and keyspace group info to the time fallback log to help debugging time fallback issue in multi-timeline scenario. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * config, server: fault injection in TSO Proxy (tikv#6588) ref tikv#5895 Add failure test cases. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tso: fix stream Send() and CloseSend() data race issue in TSO proxy (tikv#6591) close tikv#6590 No need to call SendClose(), because TSO proxy will cancel the stream context which will cause the corresponding grpc stream on the server side to exit. Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs, tso: fix potential inconsistency caused by non-atomic applying keyspace movement state change in the persistent store (tikv#6596) ref tikv#5895 fix potential inconsistency caused by non-atomic applying the state change in the persistent in the following cases: 1. Keyspace group split/merge 2. Keyspace movement across keyspace groups. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * keyspace, apiv2: implement the keyspace group merging API (tikv#6594) ref tikv#6589 Implement the keyspace group merging API. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * keyspace: prohibit merging the default keyspace group (tikv#6606) ref tikv#6589 Prohibit merging the default keyspace group. Signed-off-by: JmPotato <ghzpotato@gmail.com> * client: fix keyspace update in `tsoSvcDiscovery` (tikv#6612) close tikv#6611 Signed-off-by: lhy1024 <admin@liudos.us> * keyspace: enhance LockGroup with RemoveEntryOnUnlock LockOption (tikv#6629) close tikv#6628 Enhance LockGroup with RemoveEntryOnUnlock. Remove the lock of the given key from the lock group when unlock to keep minimal working set, which is suited for low qps, non-time-critical and non-consecutive large key space scenarios. One example of the last use case is that keyspace group split loads non-consecutive keyspace meta in batches and lock all loaded keyspace meta within a batch at the same time. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * keyspace: add priority of tso node for the keyspace group (tikv#6602) ref tikv#6599 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tests: reduce unnecessary time.sleep in keyspace group (tikv#6632) ref tikv#6599 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Add keyspace group info in the timestamp fallback log in the client. (tikv#6654) ref tikv#5895 Add keyspace group info in the timestamp fallback log in the client. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * tso: fix checkTSOSplit to finish split correctly (tikv#6652) ref tikv#6232 Fix `checkTSOSplit` to finish split correctly. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tests: fix TestTSOKeyspaceGroupSplitClient to avoid unexpected panic (tikv#6655) close tikv#6634 Fix `TestTSOKeyspaceGroupSplitClient` to avoid unexpected panic Signed-off-by: JmPotato <ghzpotato@gmail.com> * mcs: add log for finishing split (tikv#6656) ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * client: fix the keyspace ID RW race inside tsoServiceDiscovery (tikv#6657) ref tikv#5895 Fix the keyspace ID RW race inside `tsoServiceDiscovery`. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tso, tests: implement the keyspace group merge checker (tikv#6625) ref tikv#6589 Implement the keyspace group merge checker. Signed-off-by: JmPotato <ghzpotato@gmail.com> * keyspace, apiv2: support to split keyspace group with the keyspace ID range (tikv#6646) ref tikv#6232 Support to split keyspace group with the keyspace ID range. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs, tso: fix expensive async forwardTSORequest() and its timeout mechanism. (tikv#6664) ref tikv#6659 Fix expensive async forwardTSORequest() and its timeout mechanism. In order to handle the timeout case for forwardStream send/recv, the existing logic is to create context.withTimeout(forwardCtx,...) for every request, then start a new goroutine "forwardTSORequest", which is very expensive as shown by the profiling in tikv#6659. This change create a watchDeadline routine per forward stream and reuse it for all the forward requests in which forwardTSORequest is called synchronously. Compared to the existing logic, the new change is much cheaper and the latency is much stable. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs, tso: support weighted-election for TSO keyspace group primary election (tikv#6617) close tikv#6616 Add the tso server registry watch loop in tso's keyspace group manager. re-distribute TSO keyspace group primaries according to their replica priorities Signed-off-by: Bin Shi <binshi.bing@gmail.com> * tools: add merge commands for pd-ctl (tikv#6675) ref tikv#6589 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Add split-range cmd and fix duplicate keyspaces (tikv#6689) close tikv#6687, close tikv#6688 Add split-range cmd and fix duplicate keyspaces 1. Add split-range cmd to support StartKeyspaceID and EndKeyspaceID parameters. 2. Fix "split 0 2 2 2" generate duplicate keyspaces in the keyspace list of the group" Signed-off-by: Bin Shi <binshi.bing@gmail.com> * *: add group id to error logs (tikv#6695) close tikv#6685 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tools: add keepalive for pd-tso-bench (tikv#6699) close tikv#6681 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Add more debugging info to time fallback log. (tikv#6700) ref tikv#5895 Add more debugging info to time fallback log. [2023/06/27 10:50:54.196 -07:00] [PANIC] [tso_dispatcher.go:764] ["[tso] timestamp fallback"] [dc-location=global] [keyspace=4294967295] [last-ts="(1687888254152, 1)"] [cur-ts="(1687888254052, 2)"] [last-tso-server=127.0.0.1:3380] [cur-tso-server=127.0.0.1:3380] [last-keyspace-group-in-request=0] [cur-keyspace-group-in-request=0] [last-keyspace-group-in-response=0] [cur-keyspace-group-in-response=0] [last-response-received-at=2023/06/27 10:50:54.195 -07:00] [cur-response-received-at=2023/06/27 10:50:54.196 -07:00] Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * keyspace: refine the split and merge details (tikv#6707) close tikv#6686, close tikv#6698 - Adjust the merge operation order. - Add some logs. - Refine the code. Signed-off-by: JmPotato <ghzpotato@gmail.com> * tools: support querying keyspace group by state (tikv#6706) ref tikv#5895 Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tools: support get all groups (tikv#6714) ref tikv#5895, ref tikv#6706 Signed-off-by: Ryan Leung <rleungx@gmail.com> * Fix data race between read APIs and finshiSplit/finishMerge in keyspace group manager (tikv#6723) close tikv#6721 checkTSOMerge and checkTSOSplit will read from kgm.getKeyspaceGroupMeta finishMergeKeyspaceGroup and finishSplitKeyspaceGroup will update kgm so just return a copy to avoid data race Signed-off-by: Bin Shi <binshi.bing@gmail.com> * tso: fix memory leak introduced by timer.After (tikv#6730) close tikv#6719, ref tikv#6720 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs, tso: fix split and split-range command bugs. (tikv#6732) close tikv#6687, close tikv#6731 Fix split and split-range command bugs. Signed-off-by: Bin Shi <binshi.bing@gmail.com> * mcs: use patch method in keyspace group (tikv#6713) ref tikv#6233 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * *: fix memory leak introduced by timer.After (tikv#6720) close tikv#6719 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tso: implement groupSplitPatroller to speed up the split process (tikv#6736) ref tikv#5895, close tikv#6696 Implement `groupSplitPatroller` to speed up the split process. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * client: fix tso service discovery at the first time for NewClientWithAPIContext (tikv#6749) close tikv#6748 After NewClientWithAPIContextV2 returns, the keyspace group should be discovered by the passed keyspace name immediately Signed-off-by: Bin Shi <binshi.bing@gmail.com> * *: add test for misusing keyspace ID when creating the client (tikv#6754) ref tikv#6747, ref tikv#6748, ref tikv#6749 Signed-off-by: Ryan Leung <rleungx@gmail.com> * tso: support multi-keyspace, fault injection and keyspace-name in pd-tso-bench (tikv#6608) ref tikv#5895 support multi-keyspace, fault injection and keyspace-name in pd-tso-bench Signed-off-by: Bin Shi <binshi.bing@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * *: make test great again (tikv#6767) close tikv#6761 Signed-off-by: Ryan Leung <rleungx@gmail.com> * tso: implement deletedGroupCleaner to clean up the legacy TSO key (tikv#6745) close tikv#6589 - Implement `deletedGroupCleaner` to clean up the legacy TSO key. - Extract the timestamp key path constructor. Signed-off-by: JmPotato <ghzpotato@gmail.com> * client, tests: allow TSO fallback happens in TestMixedTSODeployment (tikv#6740) close tikv#6634 Introduce `WithAllowTSOFallback` client option to bypass the panic in `TestMixedTSODeployment`. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * tso: allow mergedTS to be zero in mergingChecker (tikv#6758) ref tikv#6589 Since it's possible that a keyspace group is to be deleted and merged before its TSO is initialized, we should allow `mergedTS` to be zero in `mergingChecker`. This PR allows this case and only block the merging when loading the TSO meets the error. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * *: move keyspace group primary path code to key_path.go (tikv#6755) ref tikv#5895 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: add log flags (tikv#6777) ref tikv#5766 Signed-off-by: Ryan Leung <rleungx@gmail.com> * keyspace, tso, apiv2: impl the interface to merge all keyspace groups into the default (tikv#6757) ref tikv#6756 Impl the interface to merge all keyspace groups into the default. Signed-off-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * pdctl: support show keyspace group primary (tikv#6747) close tikv#6746 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * pd-ctl, tests: impl the merge all keyspace groups command (tikv#6782) close tikv#6756 - Impl the merge all keyspace groups command. - Further reuse of TSO cluster-related test code. Signed-off-by: JmPotato <ghzpotato@gmail.com> * *: fix test suite race (tikv#6784) close tikv#6772 Signed-off-by: Ryan Leung <rleungx@gmail.com> * keyspace: fix data race (tikv#6797) Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * pdctl: support show keyspace meta with refresh group id (tikv#6751) close tikv#6746 Signed-off-by: lhy1024 <admin@liudos.us> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * mcs: Refactor ServicePath to make caller's life easier (tikv#6799) close tikv#6800 Signed-off-by: Xiaoguang Sun <sunxiaoguang@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * Remove the lastPhysical check in dispatchClient (tikv#6812) * *: fix `TestGetTSOImmediately` (tikv#6811) close tikv#6795 Signed-off-by: Ryan Leung <rleungx@gmail.com> * etcdutil, leadership: make more high availability (tikv#6577) close tikv#6554 Signed-off-by: lhy1024 <admin@liudos.us> * keyspace: some cherry-pick from pd-cse (tikv#6477) ref tikv#4399 Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * *: cherry pick some keyspace related things (tikv#6840) ref tikv#4399 Signed-off-by: disksing <i@disksing.com> Signed-off-by: Evan Zhou <coocood@gmail.com> Signed-off-by: AmoebaProtozoa <8039876+AmoebaProtozoa@users.noreply.github.com> Signed-off-by: Ryan Leung <rleungx@gmail.com> Co-authored-by: disksing <i@disksing.com> Co-authored-by: Evan Zhou <coocood@gmail.com> Co-authored-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> * *: fix the split problem caused by no enough replicas (tikv#6555) close tikv#6550 Signed-off-by: Ryan Leung <rleungx@gmail.com> * resolve conflicts Signed-off-by: Ryan Leung <rleungx@gmail.com> --------- Signed-off-by: Bin Shi <binshi.bing@gmail.com> Signed-off-by: Ryan Leung <rleungx@gmail.com> Signed-off-by: JmPotato <ghzpotato@gmail.com> Signed-off-by: lhy1024 <admin@liudos.us> Signed-off-by: AmoebaProtozoa <8039876+AmoebaProtozoa@users.noreply.github.com> Co-authored-by: Bin Shi <39923490+binshi-bing@users.noreply.github.com> Co-authored-by: JmPotato <ghzpotato@gmail.com> Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Co-authored-by: lhy1024 <admin@liudos.us> Co-authored-by: zzm <zhouzemin@pingcap.com> Co-authored-by: David <8039876+AmoebaProtozoa@users.noreply.github.com> Co-authored-by: Xiaoguang Sun <sunxiaoguang@users.noreply.github.com> Co-authored-by: disksing <i@disksing.com> Co-authored-by: Evan Zhou <coocood@gmail.com>
What problem does this PR solve?
Issue Number: Ref #6142
What is changed and how does it work?
Add GetMinTS API to PD(TSO) client to get the minimal timestamp across all keyspace groups, which follows the steps below:
a. the minimal timestamp from all keyspace groups, as primaries, served by this TSO server.
b. The count of keyspace groups as primaries.
c. The total count of keyspace groups of the entire TSO service.
We also discussed two more options for further optimization which might be overkilled for now. The two more options we discussed are:
Check List
Tests
Release note