Add follower read support to TiDB #11347

sunxiaoguang · 2019-07-20T01:15:48Z

Signed-off-by: Xiaoguang Sun sunxiaoguang@zhihu.com

What problem does this PR solve?

Add replica read support to TiDB. Users can use tidb_replica_read session variable to choose reading from leader or follower. To make it consistent with existing behavior, leader will be used by default unless follower is explicitly specified otherwise.

What is changed and how it works?

Add a session scope variable tidb_replica_read to specify if TiDB should read data from leader or follower.

Check List

Tests

Unit test
Will add later
Manual test (add detailed scripts or steps below)
Will add later

Code changes

Has exported function/method change
Has exported variable/fields change
Has interface methods change

Side effects
NA

Related changes

Need to update the documentation
Need to be included in the release note

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

siddontang · 2019-07-21T02:49:16Z

kv/kv.go

+
+const (
+	// ReplicaReadLeader stands for 'read from leader'.
+	ReplicaReadLeader ReplicaReadType = iota


I prefer using 1 << iota here, so we can use bit operations to easily check the type is set.

I though about this initially but doubt if it is rational to read from all kind of replicas. Read from different type of replicas may have different latency characteristics and pose different burden to leader, we can have more discussion about this.

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

codecov · 2019-07-23T02:55:11Z

Codecov Report

Merging #11347 into master will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##            master    #11347   +/-   ##
=========================================
  Coverage   81.727%   81.727%           
=========================================
  Files          434       434           
  Lines        94878     94878           
=========================================
  Hits         77541     77541           
  Misses       11879     11879           
  Partials      5458      5458

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

…_read Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

sessionctx/variable/session.go

store/tikv/region_cache.go

store/tikv/scan.go

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

sre-bot · 2019-07-31T08:23:57Z

Hi contributor, thanks for your PR.

This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically.

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

sunxiaoguang · 2019-08-02T04:53:19Z

I have made some changes according to what we agreed in last discussion. Please take a look, thanks. @overvenus @5kbpers

session/session.go

store/tikv/kv.go

coocood · 2019-08-04T08:54:53Z

store/tikv/region_cache.go

@@ -973,6 +1016,7 @@ func (c *RegionCache) switchNextPeer(r *Region, currentPeerIdx int, err error) {
 	nextIdx := (currentPeerIdx + 1) % len(rs.stores)
 	newRegionStore := rs.clone()
 	newRegionStore.workStoreIdx = int32(nextIdx)
+	newRegionStore.initFollowers()


When switchNextPeer is called, the workStoreIdx may point to a follower, so initFollowers does not do what it supposed to do.

Isn't it trying to predict that the next peer is going to be the new leader? If that's the case workStoreIdx will actually become new leader to make non replica read work.

When we failed a request, we change the workStoreIdx, then access the follower to wake up the hibernated region.

It's not always the leader.

If the workStoreIdx points to a follower, it may be the only valid follower, as we avoid to access it, all requests will be sent to the real leader.

It works but the code it's misleading.

Ok, I see. Looks like simply removing initFollowers here will work, am I right?

coocood · 2019-08-04T09:08:47Z

store/tikv/region_cache.go

@@ -70,6 +71,7 @@ type RegionStore struct {
 	workStoreIdx int32    // point to current work peer in meta.Peers and work store in stores(same idx)
 	stores       []*Store // stores in this region
 	storeFails   []uint32 // snapshots of store's fail, need reload when `storeFails[curr] != stores[cur].fail`
+	followers    []int32  // followers' index in this region


We can just use a leader index instead.
When we trying to find a follower, we can just skip the leader.

It will have to check current store index every single time that way. Wouldn't it be even more cumbersome?

The followers don't have more information, we can remove it to save memory.
And the computation cost to avoid workStoreIdx is very little.

We are using seed passed in here. Which one are we going to use when it's pointing to leader? Whichever follower it chooses as a fallback, how to make sure it can really balance the load? In addition, arrays for a million regions only cost dozens of MB. Compare to other places, the footprint of this array is actually negligible.

numPeers := 5 followerIdx := seed % (numPeers-1) if followerIdx >= workStoreIdx { followerIdx++ if followerIdx == numPeers { followerIdx = 0 } }

Brilliant, let's do it this way.

Actually, followerIdx == numPeers will always be false.

This will do

numPeers := 5 followerIdx := seed % (numPeers-1) if followerIdx >= workStoreIdx { followerIdx++ }

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

coocood

If a follower store failed, the region info will still have that failed store, it will not be removed unless we manually remove it or wait tens of minutes for PD to schedule add another follower to the region and remove the failed one.

So we will keep requesting the failed store for follower read.

Another problem is that the region cache has an assumption that the failed store is leader, if we failed to request a follower, the region cache will switch the workStoreIdx and drop region cache which makes the leader read fail.

sunxiaoguang · 2019-08-06T04:34:10Z

If a follower store failed, the region info will still have that failed store, it will not be removed unless we manually remove it or wait tens of minutes for PD to schedule add another follower to the region and remove the failed one.

So we will keep requesting the failed store for follower read.

Another problem is that the region cache has an assumption that the failed store is leader, if we failed to request a follower, the region cache will switch the workStoreIdx and drop region cache which makes the leader read fail.

For the first issue, it seems the old way of storing followers array makes better sense. When a follower is failed, it can be removed from valid followers array. Without storing such array, every single time trying to get a follower we need to check storeFails to skip it.

About the next issue, we can skip switching peer if current failure is caused by follower read If I understand it correctly.

coocood · 2019-08-06T05:40:54Z

@sunxiaoguang
A failed store may resume soon, if we remove it in the array, we may have no follower to use.

If storeFail doesn't match for a follower read, we should choose another follower and avoid invalidate the cache.

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

sunxiaoguang · 2019-08-06T14:40:10Z

@sunxiaoguang
A failed store may resume soon, if we remove it in the array, we may have no follower to use.

If storeFail doesn't match for a follower read, we should choose another follower and avoid invalidate the cache.

I made some changes to try different follower if selected one had failed. Please take a look, thanks.

coocood · 2019-08-07T03:07:52Z

@sunxiaoguang
If we remove the || in the unit tests, the logic of picking the valid follower can be covered.

go.sum

store/tikv/kv.go

store/tikv/snapshot.go

lysu · 2019-08-15T06:04:29Z

store/tikv/region_request.go

@@ -94,8 +94,14 @@ func (s *RegionRequestSender) SendReqCtx(bo *Backoffer, req *tikvrpc.Request, re
 		}
 	})

+	var replicaRead kv.ReplicaReadType


how about adding a metric for ReplicaReadType to let follower read time be observable in granfana

Sure, let me add it.

I was about to add a new label replica_type to those duration metrics. But I'm not sure if is the right way to do it. Any suggestions? Thanks.

Looks like many metrics are populated with specific label statically in code. Adding a new label to it would make it harder to do so.

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

lysu · 2019-08-15T12:19:57Z

store/tikv/kv.go

@@ -268,7 +268,7 @@ func (s *tikvStore) Begin() (kv.Transaction, error) {

 // BeginWithStartTS begins a transaction with startTS.
 func (s *tikvStore) BeginWithStartTS(startTS uint64) (kv.Transaction, error) {
-	txn, err := newTikvTxnWithStartTS(s, startTS)
+	txn, err := newTikvTxnWithStartTS(s, startTS, s.nextReplicaReadSeed())


this will make ReadSeed be changed for each txn in same session, it seem conflict to #11347 (comment)

Since it seems to be complicated to make it consistent over txn and coprocessor. And the whole discussion would be more efficient over instant messaging, therefore we had some discussion on WeChat group and agreed that we can use different policy for coprocessor and txn. Sorry for not giving out the context and clue about it here.

lysu · 2019-08-15T12:20:08Z

store/tikv/kv.go

@@ -277,7 +277,7 @@ func (s *tikvStore) BeginWithStartTS(startTS uint64) (kv.Transaction, error) {
 }

 func (s *tikvStore) GetSnapshot(ver kv.Version) (kv.Snapshot, error) {
-	snapshot := newTiKVSnapshot(s, ver)
+	snapshot := newTiKVSnapshot(s, ver, s.nextReplicaReadSeed())


lysu · 2019-08-16T05:06:36Z

LGTM

sre-bot · 2019-08-16T05:21:48Z

/run-all-tests

cherry pick pingcap#11347 to 3.1 Fix code conflicts from 4.0 to 3.1. Users can use tidb_replica_read session variable to choose reading from leader or follower. To make it consistent with existing behavior, leader will be used by default unless follower is explicitly specified otherwise. Add a session scope variable tidb_replica_read to specify if TiDB should read data from leader or follower.

shenli added the status/WIP label Jul 20, 2019

Initial effort to add follower read to TiDB

11f6f5e

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

shenli added the contribution This PR is from a community contributor. label Jul 21, 2019

siddontang reviewed Jul 21, 2019

View reviewed changes

Add tests for replica read

ef3a32b

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

sunxiaoguang and others added 5 commits July 23, 2019 11:50

Merge branch 'master' into replica_read

8beffef

Updated kvproto dependency and fixed a test

013e0ca

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

Add more tests

7277838

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

Merge branch 'master' of https://github.com/pingcap/tidb into replica…

31f1333

…_read Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

Merge branch 'master' into replica_read

c43f94b

overvenus reviewed Jul 25, 2019

View reviewed changes

sessionctx/variable/session.go Show resolved Hide resolved

store/tikv/region_cache.go Outdated Show resolved Hide resolved

store/tikv/scan.go Outdated Show resolved Hide resolved

Update kvproto

315aa95

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

sunxiaoguang and others added 4 commits August 1, 2019 10:31

Run go mod tidy

7bb7d76

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

Merge branch 'master' into replica_read

3c3f2e0

Merge branch 'master' into replica_read

f4d59d6

Refactor interface between SQL and KV layers

237005e

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

sunxiaoguang changed the title ~~[DNM] Initial effort to add follower read to TiDB~~ Add follower read support to TiDB Aug 2, 2019

coocood reviewed Aug 4, 2019

View reviewed changes

sunxiaoguang and others added 3 commits August 5, 2019 20:42

Do not store index of followers in region cache

136d760

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

Use session scope coprocessor client

93c88d4

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

Merge branch 'master' into replica_read

03c9576

coocood reviewed Aug 6, 2019

View reviewed changes

Try next follower if selected follower had failed

3968c6b

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

lzmhhh123 added component/tikv type/new-feature labels Aug 14, 2019

lzmhhh123 reviewed Aug 14, 2019

View reviewed changes

go.sum Show resolved Hide resolved

coocood requested review from tiancaiamao and lysu August 14, 2019 07:38

lzmhhh123 reviewed Aug 14, 2019

View reviewed changes

store/tikv/kv.go Show resolved Hide resolved

Merge branch 'master' into replica_read

b03c2d9

lysu reviewed Aug 15, 2019

View reviewed changes

store/tikv/snapshot.go Show resolved Hide resolved

lysu reviewed Aug 15, 2019

View reviewed changes

Assign different replica read seeds to snapshots

682333a

Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>

lysu reviewed Aug 15, 2019

View reviewed changes

coocood added status/LGT2 Indicates that a PR has LGTM 2. status/can-merge Indicates a PR has been approved by a committer. labels Aug 16, 2019

Merge branch 'master' into replica_read

b0bf2a3

coocood approved these changes Aug 16, 2019

View reviewed changes

sre-bot merged commit 523b936 into pingcap:master Aug 16, 2019

sunxiaoguang deleted the replica_read branch August 16, 2019 05:35

shldreams mentioned this pull request Oct 12, 2019

Cherry pick follower read to TiDB (#11347) #12535

Merged

sre-bot pushed a commit that referenced this pull request Oct 12, 2019

Cherry pick follower read to TiDB (#11347) (#12535)

7459d01

marsishandsome mentioned this pull request Oct 25, 2019

Add follower read support pingcap/tispark#1171

Open

lonng added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Nov 12, 2019

coocood mentioned this pull request Nov 14, 2019

Cherry pick follower read to TiDB (#11347) #13464

Merged

jackysp pushed a commit that referenced this pull request Nov 15, 2019

Cherry pick follower read to TiDB (#11347) (#13464)

6308931

coocood added a commit that referenced this pull request Nov 20, 2019

Cherry pick follower read to TiDB (#11347) (#13464)

68af256

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add follower read support to TiDB #11347

Add follower read support to TiDB #11347

sunxiaoguang commented Jul 20, 2019

siddontang Jul 21, 2019

sunxiaoguang Jul 23, 2019

codecov bot commented Jul 23, 2019 •

edited

Loading

sre-bot commented Jul 31, 2019

sunxiaoguang commented Aug 2, 2019

coocood Aug 4, 2019

sunxiaoguang Aug 5, 2019

coocood Aug 5, 2019

sunxiaoguang Aug 5, 2019

sunxiaoguang Aug 5, 2019

coocood Aug 4, 2019

sunxiaoguang Aug 5, 2019

coocood Aug 5, 2019

sunxiaoguang Aug 5, 2019

coocood Aug 5, 2019 •

edited

Loading

sunxiaoguang Aug 5, 2019

coocood Aug 5, 2019

sunxiaoguang Aug 5, 2019

coocood left a comment

sunxiaoguang commented Aug 6, 2019

coocood commented Aug 6, 2019

sunxiaoguang commented Aug 6, 2019 •

edited

Loading

coocood commented Aug 7, 2019

lysu Aug 15, 2019

sunxiaoguang Aug 15, 2019

sunxiaoguang Aug 15, 2019 •

edited

Loading

sunxiaoguang Aug 15, 2019

lysu Aug 15, 2019

sunxiaoguang Aug 15, 2019

lysu Aug 15, 2019

lysu commented Aug 16, 2019

sre-bot commented Aug 16, 2019

Add follower read support to TiDB #11347

Add follower read support to TiDB #11347

Conversation

sunxiaoguang commented Jul 20, 2019

What problem does this PR solve?

What is changed and how it works?

Check List

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jul 23, 2019 • edited Loading

Codecov Report

sre-bot commented Jul 31, 2019

sunxiaoguang commented Aug 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coocood Aug 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coocood left a comment

Choose a reason for hiding this comment

sunxiaoguang commented Aug 6, 2019

coocood commented Aug 6, 2019

sunxiaoguang commented Aug 6, 2019 • edited Loading

coocood commented Aug 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunxiaoguang Aug 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lysu commented Aug 16, 2019

sre-bot commented Aug 16, 2019

codecov bot commented Jul 23, 2019 •

edited

Loading

coocood Aug 5, 2019 •

edited

Loading

sunxiaoguang commented Aug 6, 2019 •

edited

Loading

sunxiaoguang Aug 15, 2019 •

edited

Loading