kvstreamer: consider setting MaxSpanRequestKeys on parallel batches issued by the Streamer #67885

yuzefovich · 2021-07-21T23:05:01Z

Once #67040 is implemented, we will have a library that performs parallel scans while adhering to memory limits. In order to simplify the discussion on the RFC, we have consciously put aside thinking about queries with LIMIT. As a follow-up task to improving the implementation/usage of the Streamer library we should revisit the cases when we have a hard or a soft limit and set MaxSpanRequestKeys on the batches whenever appropriate.

Quoting Nathan from the RFC review:

At a minimum, the Streamer can use MaxSpanRequestKeys to place upper bounds on each
individual batch of ScanRequests. Even if we assume that all other concurrent batches will
return 0 rows, this can still be useful to place an upper bound on how far we can overshoot
the limit. Without the use of MaxSpanRequestKeys, there is no limit to how far we can
overshoot. With a large enough TargetBytes and with small keys, we can pull back
thousands of unnecessary keys and scan hundreds of unnecessary ranges (especially with
many MVCC tombstones in the way). With the most conservative use of
MaxSpanRequestKeys, we can bound the amount we can overshoot to (P - 1) * limit keys.

This discussion is applicable to lookup (not index) joins, regardless of whether the lookup columns form a key and whether there is ON expression. Quoting Becca:

When lookup columns form a key, we don't know that all input rows will have matches. We
may select a larger number of input rows, but only want the top k that have matches.

When lookup columns don't form key:
- empty ON expression - we could set a hard limit on each lookup.
- non-empty ON expression - the optimizer can estimate the selectivity of the ON
  expression and determine a soft limit (or "limit hint") based on that.

Jira issue: CRDB-8760

The text was updated successfully, but these errors were encountered:

github-actions · 2024-01-15T11:05:21Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

yuzefovich added the C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. label Jul 21, 2021

yuzefovich self-assigned this Jul 21, 2021

blathers-crl bot added the T-sql-queries SQL Queries Team label Jul 21, 2021

yuzefovich mentioned this issue May 31, 2022

kvcoord: permit parallelization of scans with limits #54680

Open

15 tasks

yuzefovich removed their assignment Jul 7, 2022

yuzefovich changed the title ~~sql: consider setting MaxSpanRequestKeys on parallel batches issued by the Streamer~~ kvstreamer: consider setting MaxSpanRequestKeys on parallel batches issued by the Streamer Jul 21, 2022

yuzefovich mentioned this issue Apr 4, 2023

sql: parallelize multi-region scans in more cases #100496

Closed

yuzefovich mentioned this issue Jun 10, 2023

kvstreamer: consider using the streamer for the TableReader #82164

Open

4 tasks

github-actions bot added the no-issue-activity label Jan 15, 2024

yuzefovich removed the no-issue-activity label Jan 15, 2024

michae2 mentioned this issue Aug 15, 2024

sql: generic query plans do not push limit into lookup join #128704

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvstreamer: consider setting MaxSpanRequestKeys on parallel batches issued by the Streamer #67885

kvstreamer: consider setting MaxSpanRequestKeys on parallel batches issued by the Streamer #67885

yuzefovich commented Jul 21, 2021 •

edited by cockroach-jira-scripts

Loading

github-actions bot commented Jan 15, 2024

kvstreamer: consider setting MaxSpanRequestKeys on parallel batches issued by the Streamer #67885

kvstreamer: consider setting MaxSpanRequestKeys on parallel batches issued by the Streamer #67885

Comments

yuzefovich commented Jul 21, 2021 • edited by cockroach-jira-scripts Loading

github-actions bot commented Jan 15, 2024

yuzefovich commented Jul 21, 2021 •

edited by cockroach-jira-scripts

Loading