Bug fix: fix the Round-Robin algorithm when "ReadMode == ReplicaReadMixed". #663
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Lucasliang nkcs_lykx@hotmail.com
Bug report
When I was doing relevant optimization on ReadMode, I found there existed a bug on Round-Robin strategy when
ReadMode == ReplicaReadMixed
.Supposing that we have a TiKV cluster with 3 nodes - [node 1, node 2, node 3].
Firstly, the cluster keeps in normal state, the Read flows sent to each nodes are in Uniform Distribution.
And if we made one node abnormal, taking Node 1 as the choice, all flows sent to Node 1 would be redirected to Node 3.
With current Round-Robin strategy
Because the following steps:
state.lastIdx
is randomly generated ==> Could be any one of [node 1, node 2, node 3].client-go/internal/locate/region_request.go
Lines 526 to 528 in f313ddf
state.lastIdx
== [Node 2], it will be filtered byisCandidate
.client-go/internal/locate/region_request.go
Lines 550 to 552 in f313ddf
i == 0
andstate.lastIdx + i
, so the first choice, that is [Node 2], will be filtered. And then loops withi == 1
, getting the next choice[Node 3]
, it's a normal node, then exit.And we made a test, the following CPU metrics of TiKV cluster proved it:
What we expect
What we expect is that, if Node 2 is abnormal, all flows originally sent to this node should be uniformly redirected to other nodes.
Solution
I made a minor optimization on the original Round-Robin strategy as the following shows. After we try to filter out abnormal node choice, we should randomly choose one which meets the requirements.
And after this supplementary strategy is introduced, we get the expected results: