Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: use histograms for joins #41204

Open
awoods187 opened this issue Sep 30, 2019 · 4 comments · May be fixed by #138094
Open

sql: use histograms for joins #41204

awoods187 opened this issue Sep 30, 2019 · 4 comments · May be fixed by #138094
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@awoods187
Copy link
Contributor

awoods187 commented Sep 30, 2019

Using histograms for joins will improve the ability of the CBO to pick the correct plan and therefore pick the best query which has the lowest latency and best performance for users.

We need to perform some experiments to understand the impact as well as level of effort. This may also be needed for all operators.

Epic CRDB-16930

Jira issue: CRDB-5481

Jira issue: CRDB-13894

@awoods187 awoods187 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Sep 30, 2019
@rytaft
Copy link
Collaborator

rytaft commented Mar 18, 2021

Note that we use histograms to estimate the size of join inputs (assuming the inputs are scans/selects), but we don't yet use histograms to estimate the output cardinality or output data distribution of joins.

Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@mgartner mgartner moved this from Backlog (DO NOT ADD NEW ISSUES) to New Backlog in SQL Queries Feb 1, 2024
@mgartner
Copy link
Collaborator

mgartner commented Feb 1, 2024

@rytaft Can you post a draft PR of the work you've done on this so it's not lost? Thanks!

rytaft added a commit to rytaft/cockroach that referenced this issue Dec 30, 2024
This commit uses histograms to estimate the output cardinality
of joins, and enables joins to pass histograms up the query plan
tree. It introduces two new operations on histograms: Intersect
and InnerJoin, which are used to update the histograms for equivalent
column groups. These in turn support the estimation of join statistics
in the statisticsBuilder.

Fixes cockroachdb#41204

Release note (performance improvement): The optimizer now uses
histograms to estimate the output cardinality of joins, which
may lead to better statistics estimates and better query plans.
@rytaft rytaft linked a pull request Dec 30, 2024 that will close this issue
rytaft added a commit to rytaft/cockroach that referenced this issue Dec 30, 2024
This commit uses histograms to estimate the output cardinality
of joins, and enables joins to pass histograms up the query plan
tree. It introduces two new operations on histograms: Intersect
and InnerJoin, which are used to update the histograms for equivalent
column groups. These in turn support the estimation of join statistics
in the statisticsBuilder.

Fixes cockroachdb#41204

Release note (performance improvement): The optimizer now uses
histograms to estimate the output cardinality of joins, which
may lead to better statistics estimates and better query plans.
@rytaft
Copy link
Collaborator

rytaft commented Dec 30, 2024

I've (finally) posted #138094 with my changes, rebased on master. I don't remember if I benchmarked TPC-H when I did this, so I suppose the next step would be to do that benchmarking to see if it's actually an improvement. If so, maybe I (or someone) can try to get this merged behind a cluster setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

4 participants