-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2: kvstreamer: improve avg response estimation for partial responses #104137
release-22.2: kvstreamer: improve avg response estimation for partial responses #104137
Conversation
This commit fixes the avg response estimator when it comes to handling partial responses. "Partial responses" mean that a single ScanRequest was evaluated across multiple BatchRequests because the response happened to be large enough to exceed TargetBytes budget we used. For example, if a ScanRequest needs to fetch 100KiB of data, but the Streamer only gives 1KiB TargetBytes budget, then this ScanRequest will be evaluated across 100 BatchRequests, and each BatchRequest would contain a single "partial" response of 1KiB in size. Previously, the avg response estimator would treat these partial responses independently, so it would come up with the estimate of 1KiB for that example. However, this is very suboptimal - the goal of the estimator is to guess the footprint of the whole response to a single-range request. As a result of the previous behavior, the streamer could keep on paginating the ScanResponse in very small batches (with the estimate never increasing) which would then lead to long latency for query evaluation. This commit fixes this problem by adjusting what the estimator includes in the denominator for the average computation. In particular, this commit makes it so that only "non-resume" responses are included into that number. The idea is that when we receive the first paginated response, we increase the counter of responses, but on all consequent "resume" responses - we don't. This allows us to "update" the footprint of the big ScanResponse that is being paginated across multiple BatchRequests. In the example above, our estimate will be growing exponentially, and as a result, instead of performing 100 BatchRequests, we will now do only 7. This requires tracking whether we have received a response to a particular ScanRequest, and in order to optimize the space usage of such tracking, a simple utility bitmap package is introduced. Impact on TPCH (average over 50 runs of `tpchvec/perf`): ``` Q1: before: 3.21s after: 3.23s 0.79% Q2: before: 3.38s after: 3.16s -6.42% Q3: before: 2.56s after: 2.55s -0.11% Q4: before: 1.76s after: 1.61s -8.41% Q5: before: 2.55s after: 2.47s -3.37% Q6: before: 4.64s after: 4.65s 0.21% Q7: before: 5.89s after: 5.56s -5.57% Q8: before: 1.09s after: 1.07s -1.33% Q9: before: 5.61s after: 5.55s -1.05% Q10: before: 2.21s after: 2.09s -5.47% Q11: before: 0.97s after: 0.94s -2.64% Q12: before: 4.88s after: 4.43s -9.31% Q13: before: 1.15s after: 1.01s -11.92% Q14: before: 0.45s after: 0.45s 1.16% Q15: before: 2.53s after: 2.51s -0.58% Q16: before: 0.92s after: 0.90s -2.38% Q17: before: 0.24s after: 0.24s -0.58% Q18: before: 2.03s after: 2.02s -0.46% Q19: before: 0.48s after: 0.48s -0.99% Q20: before: 9.74s after: 9.71s -0.38% Q21: before: 5.05s after: 5.01s -0.81% Q22: before: 0.58s after: 0.57s -0.38% ``` Release note: None
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)
Backport 1/1 commits from #103602.
/cc @cockroachdb/release
This commit fixes the avg response estimator when it comes to handling
partial responses. "Partial responses" mean that a single ScanRequest
was evaluated across multiple BatchRequests because the response
happened to be large enough to exceed TargetBytes budget we used. For
example, if a ScanRequest needs to fetch 100KiB of data, but the
Streamer only gives 1KiB TargetBytes budget, then this ScanRequest will
be evaluated across 100 BatchRequests, and each BatchRequest would
contain a single "partial" response of 1KiB in size.
Previously, the avg response estimator would treat these partial
responses independently, so it would come up with the estimate of 1KiB
for that example. However, this is very suboptimal - the goal of the
estimator is to guess the footprint of the whole response to a
single-range request. As a result of the previous behavior, the streamer
could keep on paginating the ScanResponse in very small batches (with
the estimate never increasing) which would then lead to long latency for
query evaluation.
This commit fixes this problem by adjusting what the estimator includes
in the denominator for the average computation. In particular, this
commit makes it so that only "non-resume" responses are included into
that number. The idea is that when we receive the first paginated
response, we increase the counter of responses, but on all consequent
"resume" responses - we don't. This allows us to "update" the footprint
of the big ScanResponse that is being paginated across multiple
BatchRequests. In the example above, our estimate will be growing
exponentially, and as a result, instead of performing 100 BatchRequests,
we will now do only 7.
This requires tracking whether we have received a response to
a particular ScanRequest, and in order to optimize the space usage of
such tracking, a simple utility bitmap package is introduced.
Impact on TPCH (average over 50 runs of
tpchvec/perf
):Addresses: #82164.
Fixes: #103586.
Release note: None
Release justification: bug fix.