-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add follower read support to TiDB #11347
Conversation
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
kv/kv.go
Outdated
|
||
const ( | ||
// ReplicaReadLeader stands for 'read from leader'. | ||
ReplicaReadLeader ReplicaReadType = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer using 1 << iota
here, so we can use bit operations to easily check the type is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though about this initially but doubt if it is rational to read from all kind of replicas. Read from different type of replicas may have different latency characteristics and pose different burden to leader, we can have more discussion about this.
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Codecov Report
@@ Coverage Diff @@
## master #11347 +/- ##
=========================================
Coverage 81.727% 81.727%
=========================================
Files 434 434
Lines 94878 94878
=========================================
Hits 77541 77541
Misses 11879 11879
Partials 5458 5458 |
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
…_read Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Hi contributor, thanks for your PR. This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically. |
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
I have made some changes according to what we agreed in last discussion. Please take a look, thanks. @overvenus @5kbpers |
store/tikv/region_cache.go
Outdated
@@ -973,6 +1016,7 @@ func (c *RegionCache) switchNextPeer(r *Region, currentPeerIdx int, err error) { | |||
nextIdx := (currentPeerIdx + 1) % len(rs.stores) | |||
newRegionStore := rs.clone() | |||
newRegionStore.workStoreIdx = int32(nextIdx) | |||
newRegionStore.initFollowers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When switchNextPeer
is called, the workStoreIdx may point to a follower, so initFollowers
does not do what it supposed to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it trying to predict that the next peer is going to be the new leader? If that's the case workStoreIdx will actually become new leader to make non replica read work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we failed a request, we change the workStoreIdx
, then access the follower to wake up the hibernated region.
It's not always the leader.
If the workStoreIdx points to a follower, it may be the only valid follower, as we avoid to access it, all requests will be sent to the real leader.
It works but the code it's misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. Looks like simply removing initFollowers here will work, am I right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
store/tikv/region_cache.go
Outdated
@@ -70,6 +71,7 @@ type RegionStore struct { | |||
workStoreIdx int32 // point to current work peer in meta.Peers and work store in stores(same idx) | |||
stores []*Store // stores in this region | |||
storeFails []uint32 // snapshots of store's fail, need reload when `storeFails[curr] != stores[cur].fail` | |||
followers []int32 // followers' index in this region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just use a leader index instead.
When we trying to find a follower, we can just skip the leader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will have to check current store index every single time that way. Wouldn't it be even more cumbersome?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The followers
don't have more information, we can remove it to save memory.
And the computation cost to avoid workStoreIdx
is very little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using seed passed in here. Which one are we going to use when it's pointing to leader? Whichever follower it chooses as a fallback, how to make sure it can really balance the load? In addition, arrays for a million regions only cost dozens of MB. Compare to other places, the footprint of this array is actually negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numPeers := 5
followerIdx := seed % (numPeers-1)
if followerIdx >= workStoreIdx {
followerIdx++
if followerIdx == numPeers {
followerIdx = 0
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant, let's do it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, followerIdx == numPeers
will always be false.
This will do
numPeers := 5
followerIdx := seed % (numPeers-1)
if followerIdx >= workStoreIdx {
followerIdx++
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a follower store failed, the region info will still have that failed store, it will not be removed unless we manually remove it or wait tens of minutes for PD to schedule add another follower to the region and remove the failed one.
So we will keep requesting the failed store for follower read.
Another problem is that the region cache has an assumption that the failed store is leader, if we failed to request a follower, the region cache will switch the workStoreIdx and drop region cache which makes the leader read fail.
For the first issue, it seems the old way of storing followers array makes better sense. When a follower is failed, it can be removed from valid followers array. Without storing such array, every single time trying to get a follower we need to check storeFails to skip it. About the next issue, we can skip switching peer if current failure is caused by follower read If I understand it correctly. |
@sunxiaoguang If storeFail doesn't match for a follower read, we should choose another follower and avoid invalidate the cache. |
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
I made some changes to try different follower if selected one had failed. Please take a look, thanks. |
@sunxiaoguang |
@@ -94,8 +94,14 @@ func (s *RegionRequestSender) SendReqCtx(bo *Backoffer, req *tikvrpc.Request, re | |||
} | |||
}) | |||
|
|||
var replicaRead kv.ReplicaReadType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about adding a metric for ReplicaReadType
to let follower read time be observable in granfana
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was about to add a new label replica_type
to those duration metrics. But I'm not sure if is the right way to do it. Any suggestions? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like many metrics are populated with specific label statically in code. Adding a new label to it would make it harder to do so.
Signed-off-by: Xiaoguang Sun <sunxiaoguang@zhihu.com>
@@ -268,7 +268,7 @@ func (s *tikvStore) Begin() (kv.Transaction, error) { | |||
|
|||
// BeginWithStartTS begins a transaction with startTS. | |||
func (s *tikvStore) BeginWithStartTS(startTS uint64) (kv.Transaction, error) { | |||
txn, err := newTikvTxnWithStartTS(s, startTS) | |||
txn, err := newTikvTxnWithStartTS(s, startTS, s.nextReplicaReadSeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will make ReadSeed
be changed for each txn in same session, it seem conflict to #11347 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it seems to be complicated to make it consistent over txn and coprocessor. And the whole discussion would be more efficient over instant messaging, therefore we had some discussion on WeChat group and agreed that we can use different policy for coprocessor and txn. Sorry for not giving out the context and clue about it here.
@@ -277,7 +277,7 @@ func (s *tikvStore) BeginWithStartTS(startTS uint64) (kv.Transaction, error) { | |||
} | |||
|
|||
func (s *tikvStore) GetSnapshot(ver kv.Version) (kv.Snapshot, error) { | |||
snapshot := newTiKVSnapshot(s, ver) | |||
snapshot := newTiKVSnapshot(s, ver, s.nextReplicaReadSeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
LGTM |
/run-all-tests |
cherry pick pingcap#11347 to 3.1 Fix code conflicts from 4.0 to 3.1. Users can use tidb_replica_read session variable to choose reading from leader or follower. To make it consistent with existing behavior, leader will be used by default unless follower is explicitly specified otherwise. Add a session scope variable tidb_replica_read to specify if TiDB should read data from leader or follower.
Signed-off-by: Xiaoguang Sun sunxiaoguang@zhihu.com
What problem does this PR solve?
Add replica read support to TiDB. Users can use
tidb_replica_read
session variable to choose reading from leader or follower. To make it consistent with existing behavior, leader will be used by default unless follower is explicitly specified otherwise.What is changed and how it works?
Add a session scope variable
tidb_replica_read
to specify if TiDB should read data from leader or follower.Check List
Tests
Will add later
Will add later
Code changes
Side effects
NA
Related changes