-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: do not split excluded lower value ranges #12009
Conversation
Codecov Report
@@ Coverage Diff @@
## master #12009 +/- ##
===========================================
Coverage 81.5929% 81.5929%
===========================================
Files 452 452
Lines 98060 98060
===========================================
Hits 80010 80010
Misses 12401 12401
Partials 5649 5649 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we append MaxValueDatum
/MinValueDatum
when we found that the length of the upper and lower is not the same?
@winoros Main reason that I did not choose this solution is that there are old histograms and it is not easy to determine the original number of columns, because it may already been |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -766,11 +766,11 @@ func formatBuckets(hg *statistics.Histogram, lowBkt, highBkt, idxCols int) strin | |||
return hg.BucketToString(lowBkt, idxCols) | |||
} | |||
if lowBkt+1 == highBkt { | |||
return fmt.Sprintf("%s, %s", hg.BucketToString(lowBkt, 0), hg.BucketToString(highBkt, 0)) | |||
return fmt.Sprintf("%s, %s", hg.BucketToString(lowBkt, idxCols), hg.BucketToString(highBkt, idxCols)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without it, the result for index is unreadable.
Co-Authored-By: Kenan Yao <cauchy1992@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
cherry pick to release-2.1 failed |
cherry pick to release-3.0 failed |
What problem does this PR solve?
Fix #11907
This bug would occur on indices that have multiply columns like
index idx(a,b)
, and the bug happens when:Queries that only request ranges on prefix like
where a >= 10
are issued and collected by feedback, which result in one of the bucket upper bound becomes single encoded value(10)
, and there is a chance that next buckets's upper bound is(10, 11)
.Then another queries comes like
where a >= 9
, soSplitRange
is called with[9, +inf)
, so we will split by(10)
and(10, 11)
, which result in ranges[9, 10]
,(10,(10,11)]
,..., and every thing looks fine now.The caller, which is
IndexRangesToKVRanges
, will process the splited ranges(10,(10,11)]
. Since lower is excluded,(10,(10,11)]
will be transformed to[11, (10,12))
by usingPrefixNext
, so the invalid ranges happens.What is changed and how it works?
When we split ranges, do not generate execluded lower ranges. This PR does it by split the ranges by lower bound and always generate included lower ranges and excluded upper ranges.
Check List
Tests
Code changes
Side effects
Related changes
Release note