-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: use histograms for joins #41204
Comments
Note that we use histograms to estimate the size of join inputs (assuming the inputs are scans/selects), but we don't yet use histograms to estimate the output cardinality or output data distribution of joins. |
We have marked this issue as stale because it has been inactive for |
@rytaft Can you post a draft PR of the work you've done on this so it's not lost? Thanks! |
This commit uses histograms to estimate the output cardinality of joins, and enables joins to pass histograms up the query plan tree. It introduces two new operations on histograms: Intersect and InnerJoin, which are used to update the histograms for equivalent column groups. These in turn support the estimation of join statistics in the statisticsBuilder. Fixes cockroachdb#41204 Release note (performance improvement): The optimizer now uses histograms to estimate the output cardinality of joins, which may lead to better statistics estimates and better query plans.
This commit uses histograms to estimate the output cardinality of joins, and enables joins to pass histograms up the query plan tree. It introduces two new operations on histograms: Intersect and InnerJoin, which are used to update the histograms for equivalent column groups. These in turn support the estimation of join statistics in the statisticsBuilder. Fixes cockroachdb#41204 Release note (performance improvement): The optimizer now uses histograms to estimate the output cardinality of joins, which may lead to better statistics estimates and better query plans.
I've (finally) posted #138094 with my changes, rebased on master. I don't remember if I benchmarked TPC-H when I did this, so I suppose the next step would be to do that benchmarking to see if it's actually an improvement. If so, maybe I (or someone) can try to get this merged behind a cluster setting. |
Using histograms for joins will improve the ability of the CBO to pick the correct plan and therefore pick the best query which has the lowest latency and best performance for users.
We need to perform some experiments to understand the impact as well as level of effort. This may also be needed for all operators.
Epic CRDB-16930
Jira issue: CRDB-5481
Jira issue: CRDB-13894
The text was updated successfully, but these errors were encountered: