Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner should support skew distinct aggregate rewrite #36169

Closed
fixdb opened this issue Jul 13, 2022 · 1 comment · Fixed by #36181
Closed

planner should support skew distinct aggregate rewrite #36169

fixdb opened this issue Jul 13, 2022 · 1 comment · Fixed by #36181
Assignees
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@fixdb
Copy link
Contributor

fixdb commented Jul 13, 2022

Enhancement

For the following kind of query:

select S_NATIONKEY as s, 
  count(S_SUPPKEY), 
  count(distinct S_NAME) 
from supplier
group by s;

If the group key is highly skewed and the distinct key has large number of distinct values (a.k.a. high cardinality), the query execution will be slow.

We should be able to rewrite the above query to the following query to avoid skew:

select S_NATIONKEY as s, 
  sum(cnt_suppkey), 
  count(S_NAME) 
from (
    select S_NATIONKEY, S_NAME, count(S_SUPPKEY) as cnt_suppkey
    from supplier
    group by S_NATIONKEY, S_NAME
) as T
group by s;
@fixdb fixdb added the type/enhancement The issue or PR belongs to an enhancement. label Jul 13, 2022
@fixdb
Copy link
Contributor Author

fixdb commented Jul 13, 2022

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant