Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsqlrun: bump index joiner batch size from 100 to 10k #38622

Merged
merged 1 commit into from
Jul 9, 2019
Merged

distsqlrun: bump index joiner batch size from 100 to 10k #38622

merged 1 commit into from
Jul 9, 2019

Conversation

asubiotto
Copy link
Contributor

To amortize the cost of looking up rows, we create a batch of 100 rows
to use in one lookup request. Since the relationship is 1:1 in the index
joiner (we're doing a lookup on the primary index), the result size is
the same size as the request batch.

This commit increases the size of this batch to 10k, increasing the
result size for each lookup to 10k. This results in some significant
performance gains: e.g. tpch query 6 drops to a fifth of its original
runtime on a scalefactor 10 dataset due to amortizing lookups. Note that
this comes with increased memory usage per request. However, the
tableReader limits results for its scans to 10k as well, and there is no
good reason to allow normal scans to use more memory than index joiner
lookups. In the absence of proper accounting for KV responses, the
strategy of allowing index lookups to use the same resources and have
the same limitations as normal scans makes sense.

Release note: None

To amortize the cost of looking up rows, we create a batch of 100 rows
to use in one lookup request. Since the relationship is 1:1 in the index
joiner (we're doing a lookup on the primary index), the result size is
the same size as the request batch.

This commit increases the size of this batch to 10k, increasing the
result size for each lookup to 10k. This results in some significant
performance gains: e.g. tpch query 6 drops to a fifth of its original
runtime on a scalefactor 10 dataset due to amortizing lookups. Note that
this comes with increased memory usage per request. However, the
tableReader limits results for its scans to 10k as well, and there is no
good reason to allow normal scans to use more memory than index joiner
lookups. In the absence of proper accounting for KV responses, the
strategy of allowing index lookups to use the same resources and have
the same limitations as normal scans makes sense.

Release note: None
@asubiotto asubiotto requested review from jordanlewis and a team July 2, 2019 20:11
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@asubiotto
Copy link
Contributor Author

friendly ping

@jordanlewis
Copy link
Member

LGTM. It makes me kinda nervous that we have no way to test this. Make sure to pay attention to the nightlies and so on once this goes in.

@asubiotto
Copy link
Contributor Author

bors r=jordanlewis

@craig
Copy link
Contributor

craig bot commented Jul 9, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jul 9, 2019

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jul 9, 2019

Build failed (retrying...)

craig bot pushed a commit that referenced this pull request Jul 9, 2019
38622: distsqlrun: bump index joiner batch size from 100 to 10k r=jordanlewis a=asubiotto

To amortize the cost of looking up rows, we create a batch of 100 rows
to use in one lookup request. Since the relationship is 1:1 in the index
joiner (we're doing a lookup on the primary index), the result size is
the same size as the request batch.

This commit increases the size of this batch to 10k, increasing the
result size for each lookup to 10k. This results in some significant
performance gains: e.g. tpch query 6 drops to a fifth of its original
runtime on a scalefactor 10 dataset due to amortizing lookups. Note that
this comes with increased memory usage per request. However, the
tableReader limits results for its scans to 10k as well, and there is no
good reason to allow normal scans to use more memory than index joiner
lookups. In the absence of proper accounting for KV responses, the
strategy of allowing index lookups to use the same resources and have
the same limitations as normal scans makes sense.

Release note: None

Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Jul 9, 2019

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants