Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP sql: use the streamer for scans #103659

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented May 19, 2023

To be filled.

Impact on single node TPCH (avg of 50 runs of single-tenant of multitenant/tpch) of last 3 commits (updated on 11/8):

Q1:	before: 5.20s	after: 4.03s	 -22.55%
Q2:	before: 3.42s	after: 3.18s	 -7.21%
Q3:	before: 3.71s	after: 3.21s	 -13.31%
Q4:	before: 1.35s	after: 1.31s	 -2.80%
Q5:	before: 2.33s	after: 2.25s	 -3.21%
Q6:	before: 6.52s	after: 6.36s	 -2.57%
Q7:	before: 6.39s	after: 6.27s	 -1.91%
Q8:	before: 0.94s	after: 0.93s	 -1.47%
Q9:	before: 7.82s	after: 8.04s	 2.75%
Q10:	before: 1.84s	after: 1.81s	 -1.61%
Q11:	before: 0.60s	after: 0.59s	 -0.27%
Q12:	before: 6.93s	after: 6.74s	 -2.80%
Q13:	before: 2.16s	after: 1.80s	 -16.47%
Q14:	before: 0.67s	after: 0.62s	 -7.10%
Q15:	before: 3.55s	after: 3.47s	 -2.27%
Q16:	before: 0.75s	after: 0.72s	 -3.85%
Q17:	before: 0.23s	after: 0.20s	 -10.53%
Q18:	before: 3.42s	after: 3.75s	 9.70%
Q19:	before: 0.44s	after: 0.37s	 -16.93%
Q20:	before: 14.92s	after: 14.23s	 -4.65%
Q21:	before: 7.17s	after: 6.95s	 -3.19%
Q22:	before: 0.56s	after: 0.53s	 -5.36%

Fixes: #82164.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

TODO: benchmark numbers are old.

This commit improves the behavior of the Streamer when a single-range
request's footprint happens barely exceed its reservation and the
additional budget allocation is denied. In particular, previously if
non-head-of-the-line request already received a response but it slightly
exceeded its reservation, we would drop that response and would reissue
it from scratch again. This commit, instead, allows for this request to
"go in debt" if the debt happens to be small enough (in absolute terms
or in relative terms in comparison to the response's footprint) (both of
these things are controlled by newly introduced cluster settings). Note
that this "going in debt" means that the Streamer might use more memory
than its workmem allows for, but it'll never put the SQL root monitor
above the limit, so this adjustment seems reasonable.

It's worth calling out the most likely case when this "going in debt"
case occurs. I believe this is the case when we made the reservation
large enough for TargetBytes limit but not for (at least not for the
whole) "responses overhead" (introduced in 40d28a5).
In some cases I observed manually we could have a scenario where we
received 3MiB worth of data in the response yet the "responses overhead"
of 150B couldn't be "consumed", so we dropped the whole thing.

`tpchvec/perf`:
```
Q1:	before: 3.17s	after: 3.23s	 1.75%
Q2:	before: 3.40s	after: 3.32s	 -2.41%
Q3:	before: 2.64s	after: 2.65s	 0.49%
Q4:	before: 1.69s	after: 1.72s	 1.95%
Q5:	before: 2.36s	after: 2.42s	 2.50%
Q6:	before: 4.64s	after: 4.78s	 2.97%
Q7:	before: 5.68s	after: 5.68s	 0.06%
Q8:	before: 1.08s	after: 1.14s	 5.36%
Q9:	before: 5.33s	after: 6.22s	 16.70%
Q10:	before: 2.06s	after: 2.18s	 5.50%
Q11:	before: 0.95s	after: 0.96s	 1.52%
Q12:	before: 4.68s	after: 4.63s	 -1.00%
Q13:	before: 1.05s	after: 1.13s	 7.26%
Q14:	before: 0.45s	after: 0.48s	 6.89%
Q15:	before: 2.55s	after: 2.59s	 1.60%
Q16:	before: 0.94s	after: 0.91s	 -3.25%
Q17:	before: 0.24s	after: 0.25s	 2.49%
Q18:	before: 2.06s	after: 2.10s	 2.06%
Q19:	before: 0.49s	after: 0.50s	 2.57%
Q20:	before: 10.05s	after: 10.18s	 1.26%
Q21:	before: 5.06s	after: 5.11s	 1.06%
Q22:	before: 0.58s	after: 0.58s	 0.48%
```

`multitenant/tpch` (single-tenant):
```
Q1:	before: 5.91s	after: 5.87s	 -0.62%
Q2:	before: 3.90s	after: 3.90s	 -0.12%
Q3:	before: 4.13s	after: 4.10s	 -0.89%
Q4:	before: 1.34s	after: 1.35s	 0.22%
Q5:	before: 2.61s	after: 2.62s	 0.32%
Q6:	before: 7.13s	after: 7.18s	 0.70%
Q7:	before: 6.69s	after: 6.83s	 2.08%
Q8:	before: 1.03s	after: 1.04s	 0.66%
Q9:	before: 8.47s	after: 8.51s	 0.48%
Q10:	before: 1.87s	after: 1.87s	 0.39%
Q11:	before: 0.99s	after: 0.99s	 0.00%
Q12:	before: 7.51s	after: 7.54s	 0.43%
Q13:	before: 2.21s	after: 2.25s	 1.73%
Q14:	before: 0.69s	after: 0.69s	 0.87%
Q15:	before: 3.79s	after: 3.80s	 0.29%
Q16:	before: 0.89s	after: 0.89s	 0.61%
Q17:	before: 0.26s	after: 0.26s	 0.54%
Q18:	before: 3.86s	after: 3.83s	 -0.81%
Q19:	before: 0.49s	after: 0.49s	 1.11%
Q20:	before: 16.55s	after: 16.63s	 0.47%
Q21:	before: 7.15s	after: 7.15s	 0.02%
Q22:	before: 0.65s	after: 0.64s	 -1.21%
```

`multitenant/tpch` (multi-tenant):
```
Q1:	before: 11.75s	after: 11.69s	 -0.52%
Q2:	before: 4.69s	after: 4.72s	 0.64%
Q3:	before: 8.25s	after: 8.77s	 6.30%
Q4:	before: 4.55s	after: 4.56s	 0.13%
Q5:	before: 10.40s	after: 10.39s	 -0.01%
Q6:	before: 33.70s	after: 33.74s	 0.11%
Q7:	before: 24.19s	after: 24.18s	 -0.02%
Q8:	before: 3.79s	after: 3.79s	 0.09%
Q9:	before: 28.29s	after: 28.29s	 0.01%
Q10:	before: 5.01s	after: 5.01s	 0.06%
Q11:	before: 2.42s	after: 2.41s	 -0.16%
Q12:	before: 34.93s	after: 34.96s	 0.07%
Q13:	before: 3.85s	after: 3.86s	 0.33%
Q14:	before: 2.18s	after: 2.24s	 2.91%
Q15:	before: 16.90s	after: 16.93s	 0.20%
Q16:	before: 1.58s	after: 1.60s	 1.15%
Q17:	before: 1.07s	after: 1.08s	 0.41%
Q18:	before: 16.40s	after: 16.40s	 0.02%
Q19:	before: 1.95s	after: 1.95s	 -0.06%
Q20:	before: 55.45s	after: 55.45s	 0.00%
Q21:	before: 24.77s	after: 24.76s	 -0.01%
Q22:	before: 1.25s	after: 1.27s	 1.02%
```

Release note: None
TODO: this might not longer be needed - fixed by 113809.

This commit fixes very suboptimal behavior that can occur in the
streamer in the InOrder mode when a single ScanRequest needs to fetch
a lot of data and it spans multiple ranges. In such a setup, we can
buffer some results in the results buffer that would consume most of the
budget, but we're not able to return those results to the client until
the first range's chunk is fully processed. However, since buffered
results would consume all of the memory, we could reduce ourselves to
processing the first range one key at a time (because we'd set
TargetBytes limit to 1 since the budget can already be in debt). This is
now fixed by trying to spill the results buffer before issuing
head-of-the-line request with the spilling priority such that the
results for consequent ranges could be spilled.

`tpchvec/perf`:
```
Q1:	before: 3.23s	after: 3.23s	 0.02%
Q2:	before: 3.32s	after: 3.47s	 4.68%
Q3:	before: 2.65s	after: 2.57s	 -3.19%
Q4:	before: 1.72s	after: 1.78s	 3.57%
Q5:	before: 2.42s	after: 2.53s	 4.53%
Q6:	before: 4.78s	after: 4.80s	 0.48%
Q7:	before: 5.68s	after: 5.96s	 5.00%
Q8:	before: 1.14s	after: 1.13s	 -0.33%
Q9:	before: 6.22s	after: 5.43s	 -12.65%
Q10:	before: 2.18s	after: 2.24s	 2.94%
Q11:	before: 0.96s	after: 0.98s	 1.66%
Q12:	before: 4.63s	after: 4.69s	 1.23%
Q13:	before: 1.13s	after: 1.17s	 3.23%
Q14:	before: 0.48s	after: 0.47s	 -1.21%
Q15:	before: 2.59s	after: 2.56s	 -0.93%
Q16:	before: 0.91s	after: 0.98s	 8.10%
Q17:	before: 0.25s	after: 0.25s	 0.73%
Q18:	before: 2.10s	after: 2.11s	 0.19%
Q19:	before: 0.50s	after: 0.50s	 -0.20%
Q20:	before: 10.18s	after: 9.61s	 -5.55%
Q21:	before: 5.11s	after: 5.22s	 2.08%
Q22:	before: 0.58s	after: 0.60s	 2.78%
```

`multitenant/tpch` (single-tenant):
```
Q1:	before: 5.87s	after: 5.77s	 -1.77%
Q2:	before: 3.90s	after: 3.81s	 -2.32%
Q3:	before: 4.10s	after: 4.04s	 -1.47%
Q4:	before: 1.35s	after: 1.31s	 -2.95%
Q5:	before: 2.62s	after: 2.53s	 -3.64%
Q6:	before: 7.18s	after: 6.94s	 -3.37%
Q7:	before: 6.83s	after: 6.63s	 -2.94%
Q8:	before: 1.04s	after: 1.00s	 -3.89%
Q9:	before: 8.51s	after: 8.21s	 -3.50%
Q10:	before: 1.87s	after: 1.81s	 -3.27%
Q11:	before: 0.99s	after: 0.96s	 -3.57%
Q12:	before: 7.54s	after: 7.29s	 -3.25%
Q13:	before: 2.25s	after: 2.15s	 -4.41%
Q14:	before: 0.69s	after: 0.68s	 -2.37%
Q15:	before: 3.80s	after: 3.68s	 -3.10%
Q16:	before: 0.89s	after: 0.87s	 -2.64%
Q17:	before: 0.26s	after: 0.25s	 -2.84%
Q18:	before: 3.83s	after: 3.83s	 -0.09%
Q19:	before: 0.49s	after: 0.48s	 -3.36%
Q20:	before: 16.63s	after: 16.15s	 -2.87%
Q21:	before: 7.15s	after: 6.97s	 -2.61%
Q22:	before: 0.64s	after: 0.64s	 -0.22%
```

`multitenant/tpch` (multi-tenant):
```
Q1:	before: 11.69s	after: 11.36s	 -2.80%
Q2:	before: 4.72s	after: 4.54s	 -3.85%
Q3:	before: 8.77s	after: 9.28s	 5.84%
Q4:	before: 4.56s	after: 4.55s	 -0.29%
Q5:	before: 10.39s	after: 10.39s	 -0.00%
Q6:	before: 33.74s	after: 33.69s	 -0.14%
Q7:	before: 24.18s	after: 23.85s	 -1.37%
Q8:	before: 3.79s	after: 3.79s	 -0.06%
Q9:	before: 28.29s	after: 28.28s	 -0.03%
Q10:	before: 5.01s	after: 5.00s	 -0.18%
Q11:	before: 2.41s	after: 2.42s	 0.41%
Q12:	before: 34.96s	after: 34.92s	 -0.10%
Q13:	before: 3.86s	after: 3.71s	 -3.99%
Q14:	before: 2.24s	after: 2.60s	 16.26%
Q15:	before: 16.93s	after: 16.96s	 0.16%
Q16:	before: 1.60s	after: 1.65s	 3.31%
Q17:	before: 1.08s	after: 1.11s	 2.88%
Q18:	before: 16.40s	after: 16.40s	 -0.01%
Q19:	before: 1.95s	after: 1.98s	 1.39%
Q20:	before: 55.45s	after: 55.45s	 0.01%
Q21:	before: 24.76s	after: 24.76s	 -0.00%
Q22:	before: 1.27s	after: 1.24s	 -1.82%
```

Release note: None
TODO: benchmark numbers are old.

To be filled.

`tpchvec/perf`:
```
Q1:	before: 3.23s	after: 3.24s	 0.46%
Q2:	before: 3.47s	after: 3.38s	 -2.72%
Q3:	before: 2.57s	after: 2.62s	 1.99%
Q4:	before: 1.78s	after: 1.73s	 -2.96%
Q5:	before: 2.53s	after: 2.56s	 1.03%
Q6:	before: 4.80s	after: 4.86s	 1.22%
Q7:	before: 5.96s	after: 5.96s	 -0.01%
Q8:	before: 1.13s	after: 1.14s	 0.41%
Q9:	before: 5.43s	after: 5.82s	 7.16%
Q10:	before: 2.24s	after: 2.20s	 -1.78%
Q11:	before: 0.98s	after: 0.99s	 0.67%
Q12:	before: 4.69s	after: 4.88s	 3.99%
Q13:	before: 1.17s	after: 1.11s	 -4.73%
Q14:	before: 0.47s	after: 0.48s	 0.93%
Q15:	before: 2.56s	after: 2.64s	 2.90%
Q16:	before: 0.98s	after: 0.94s	 -3.63%
Q17:	before: 0.25s	after: 0.24s	 -2.01%
Q18:	before: 2.11s	after: 2.03s	 -3.69%
Q19:	before: 0.50s	after: 0.48s	 -3.58%
Q20:	before: 9.61s	after: 10.35s	 7.61%
Q21:	before: 5.22s	after: 5.28s	 1.27%
Q22:	before: 0.60s	after: 0.59s	 -0.87%
```

`multitenant/tpch` (single-tenant):
```
Q1:	before: 5.77s	after: 4.85s	 -15.87%
Q2:	before: 3.81s	after: 3.59s	 -5.62%
Q3:	before: 4.04s	after: 3.59s	 -11.18%
Q4:	before: 1.31s	after: 1.35s	 3.15%
Q5:	before: 2.53s	after: 2.63s	 4.13%
Q6:	before: 6.94s	after: 7.06s	 1.74%
Q7:	before: 6.63s	after: 6.96s	 4.93%
Q8:	before: 1.00s	after: 1.02s	 2.53%
Q9:	before: 8.21s	after: 8.40s	 2.30%
Q10:	before: 1.81s	after: 1.85s	 1.86%
Q11:	before: 0.96s	after: 1.00s	 4.81%
Q12:	before: 7.29s	after: 7.51s	 2.97%
Q13:	before: 2.15s	after: 1.93s	 -10.21%
Q14:	before: 0.68s	after: 0.67s	 -0.68%
Q15:	before: 3.68s	after: 3.76s	 2.02%
Q16:	before: 0.87s	after: 0.87s	 0.39%
Q17:	before: 0.25s	after: 0.24s	 -6.78%
Q18:	before: 3.83s	after: 2.92s	 -23.72%
Q19:	before: 0.48s	after: 0.43s	 -10.44%
Q20:	before: 16.15s	after: 16.55s	 2.44%
Q21:	before: 6.97s	after: 7.25s	 4.01%
Q22:	before: 0.64s	after: 0.62s	 -2.42%
```

`multitenant/tpch` (multi-tenant):
```
Q1:	before: 11.36s	after: 10.20s	 -10.23%
Q2:	before: 4.54s	after: 5.23s	 15.27%
Q3:	before: 9.28s	after: 9.85s	 6.07%
Q4:	before: 4.55s	after: 4.55s	 0.11%
Q5:	before: 10.39s	after: 10.41s	 0.12%
Q6:	before: 33.69s	after: 33.76s	 0.18%
Q7:	before: 23.85s	after: 24.21s	 1.49%
Q8:	before: 3.79s	after: 3.81s	 0.58%
Q9:	before: 28.28s	after: 28.31s	 0.11%
Q10:	before: 5.00s	after: 5.01s	 0.12%
Q11:	before: 2.42s	after: 2.41s	 -0.57%
Q12:	before: 34.92s	after: 35.04s	 0.34%
Q13:	before: 3.71s	after: 3.16s	 -14.90%
Q14:	before: 2.60s	after: 3.24s	 24.41%
Q15:	before: 16.96s	after: 16.92s	 -0.23%
Q16:	before: 1.65s	after: 1.78s	 8.05%
Q17:	before: 1.11s	after: 1.12s	 0.65%
Q18:	before: 16.40s	after: 17.17s	 4.69%
Q19:	before: 1.98s	after: 1.70s	 -14.20%
Q20:	before: 55.45s	after: 55.46s	 0.01%
Q21:	before: 24.76s	after: 24.77s	 0.03%
Q22:	before: 1.24s	after: 1.29s	 3.67%
```

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvstreamer: consider using the streamer for the TableReader
2 participants