Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2: kvstreamer: improve avg response estimation for partial responses #104137

Merged

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented May 31, 2023

Backport 1/1 commits from #103602.

/cc @cockroachdb/release


This commit fixes the avg response estimator when it comes to handling
partial responses. "Partial responses" mean that a single ScanRequest
was evaluated across multiple BatchRequests because the response
happened to be large enough to exceed TargetBytes budget we used. For
example, if a ScanRequest needs to fetch 100KiB of data, but the
Streamer only gives 1KiB TargetBytes budget, then this ScanRequest will
be evaluated across 100 BatchRequests, and each BatchRequest would
contain a single "partial" response of 1KiB in size.

Previously, the avg response estimator would treat these partial
responses independently, so it would come up with the estimate of 1KiB
for that example. However, this is very suboptimal - the goal of the
estimator is to guess the footprint of the whole response to a
single-range request. As a result of the previous behavior, the streamer
could keep on paginating the ScanResponse in very small batches (with
the estimate never increasing) which would then lead to long latency for
query evaluation.

This commit fixes this problem by adjusting what the estimator includes
in the denominator for the average computation. In particular, this
commit makes it so that only "non-resume" responses are included into
that number. The idea is that when we receive the first paginated
response, we increase the counter of responses, but on all consequent
"resume" responses - we don't. This allows us to "update" the footprint
of the big ScanResponse that is being paginated across multiple
BatchRequests. In the example above, our estimate will be growing
exponentially, and as a result, instead of performing 100 BatchRequests,
we will now do only 7.

This requires tracking whether we have received a response to
a particular ScanRequest, and in order to optimize the space usage of
such tracking, a simple utility bitmap package is introduced.

Impact on TPCH (average over 50 runs of tpchvec/perf):

Q1:	before: 3.21s	after: 3.23s	 0.79%
Q2:	before: 3.38s	after: 3.16s	 -6.42%
Q3:	before: 2.56s	after: 2.55s	 -0.11%
Q4:	before: 1.76s	after: 1.61s	 -8.41%
Q5:	before: 2.55s	after: 2.47s	 -3.37%
Q6:	before: 4.64s	after: 4.65s	 0.21%
Q7:	before: 5.89s	after: 5.56s	 -5.57%
Q8:	before: 1.09s	after: 1.07s	 -1.33%
Q9:	before: 5.61s	after: 5.55s	 -1.05%
Q10:	before: 2.21s	after: 2.09s	 -5.47%
Q11:	before: 0.97s	after: 0.94s	 -2.64%
Q12:	before: 4.88s	after: 4.43s	 -9.31%
Q13:	before: 1.15s	after: 1.01s	 -11.92%
Q14:	before: 0.45s	after: 0.45s	 1.16%
Q15:	before: 2.53s	after: 2.51s	 -0.58%
Q16:	before: 0.92s	after: 0.90s	 -2.38%
Q17:	before: 0.24s	after: 0.24s	 -0.58%
Q18:	before: 2.03s	after: 2.02s	 -0.46%
Q19:	before: 0.48s	after: 0.48s	 -0.99%
Q20:	before: 9.74s	after: 9.71s	 -0.38%
Q21:	before: 5.05s	after: 5.01s	 -0.81%
Q22:	before: 0.58s	after: 0.57s	 -0.38%

Addresses: #82164.
Fixes: #103586.

Release note: None

Release justification: bug fix.

This commit fixes the avg response estimator when it comes to handling
partial responses. "Partial responses" mean that a single ScanRequest
was evaluated across multiple BatchRequests because the response
happened to be large enough to exceed TargetBytes budget we used. For
example, if a ScanRequest needs to fetch 100KiB of data, but the
Streamer only gives 1KiB TargetBytes budget, then this ScanRequest will
be evaluated across 100 BatchRequests, and each BatchRequest would
contain a single "partial" response of 1KiB in size.

Previously, the avg response estimator would treat these partial
responses independently, so it would come up with the estimate of 1KiB
for that example. However, this is very suboptimal - the goal of the
estimator is to guess the footprint of the whole response to a
single-range request. As a result of the previous behavior, the streamer
could keep on paginating the ScanResponse in very small batches (with
the estimate never increasing) which would then lead to long latency for
query evaluation.

This commit fixes this problem by adjusting what the estimator includes
in the denominator for the average computation. In particular, this
commit makes it so that only "non-resume" responses are included into
that number. The idea is that when we receive the first paginated
response, we increase the counter of responses, but on all consequent
"resume" responses - we don't. This allows us to "update" the footprint
of the big ScanResponse that is being paginated across multiple
BatchRequests. In the example above, our estimate will be growing
exponentially, and as a result, instead of performing 100 BatchRequests,
we will now do only 7.

This requires tracking whether we have received a response to
a particular ScanRequest, and in order to optimize the space usage of
such tracking, a simple utility bitmap package is introduced.

Impact on TPCH (average over 50 runs of `tpchvec/perf`):
```
Q1:	before: 3.21s	after: 3.23s	 0.79%
Q2:	before: 3.38s	after: 3.16s	 -6.42%
Q3:	before: 2.56s	after: 2.55s	 -0.11%
Q4:	before: 1.76s	after: 1.61s	 -8.41%
Q5:	before: 2.55s	after: 2.47s	 -3.37%
Q6:	before: 4.64s	after: 4.65s	 0.21%
Q7:	before: 5.89s	after: 5.56s	 -5.57%
Q8:	before: 1.09s	after: 1.07s	 -1.33%
Q9:	before: 5.61s	after: 5.55s	 -1.05%
Q10:	before: 2.21s	after: 2.09s	 -5.47%
Q11:	before: 0.97s	after: 0.94s	 -2.64%
Q12:	before: 4.88s	after: 4.43s	 -9.31%
Q13:	before: 1.15s	after: 1.01s	 -11.92%
Q14:	before: 0.45s	after: 0.45s	 1.16%
Q15:	before: 2.53s	after: 2.51s	 -0.58%
Q16:	before: 0.92s	after: 0.90s	 -2.38%
Q17:	before: 0.24s	after: 0.24s	 -0.58%
Q18:	before: 2.03s	after: 2.02s	 -0.46%
Q19:	before: 0.48s	after: 0.48s	 -0.99%
Q20:	before: 9.74s	after: 9.71s	 -0.38%
Q21:	before: 5.05s	after: 5.01s	 -0.81%
Q22:	before: 0.58s	after: 0.57s	 -0.38%
```

Release note: None
@yuzefovich yuzefovich requested a review from a team as a code owner May 31, 2023 11:34
@blathers-crl
Copy link

blathers-crl bot commented May 31, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)

@yuzefovich yuzefovich merged commit 32b9b72 into cockroachdb:release-22.2 Jun 2, 2023
@yuzefovich yuzefovich deleted the backport22.2-103602 branch June 2, 2023 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants