Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Full support of set operations: add INTERSECT and EXCEPT set operators #2834

Closed
yangzhg opened this issue Feb 5, 2020 · 0 comments · Fixed by #2882
Closed

[Proposal] Full support of set operations: add INTERSECT and EXCEPT set operators #2834

yangzhg opened this issue Feb 5, 2020 · 0 comments · Fixed by #2882
Assignees
Labels
area/sql/compatibility Issues or PRs related to the SQL compatibililty proposal Categorizes an issue is a proposal

Comments

@yangzhg
Copy link
Member

yangzhg commented Feb 5, 2020

Description

Generaly set functionality includes the below. but now doris has just support UNION & UNION ALL.

UNION [DISTINCT | ALL]

INTERSECT [DISTINCT]

EXCEPT [DISTINCT]

EXCEPT in some systems is an alias for MINUS

INTERSECT operator is used to return the results of 2 or more SELECT statements. However, it only returns the rows selected by all queries or data sets. If a record exists in one query and not in the other, it will be omitted from the INTERSECT results.
image
The EXCEPT operator is used to return all rows in the first SELECT statement that are not returned by the second SELECT statement. Each SELECT statement will define a dataset. The EXCEPT operator will retrieve all records from the first dataset and then remove from the results all records from the second dataset.
image

Syntax

INTERSECT

SELECT expression1, expression2, ... expression_n
FROM tables
[WHERE conditions]
INTERSECT [DISTINCT]
SELECT expression1, expression2, ... expression_n
FROM tables
[WHERE conditions];

EXCEPT

SELECT expression1, expression2, ... expression_n
FROM tables
[WHERE conditions]
EXCEPT|MINUS [DISTINCT]
SELECT expression1, expression2, ... expression_n
FROM tables
[WHERE conditions];

Design

All data is needed to implement the intersect and except operations. An optimized method is to use hash shuffle, so that the data processed by each node is hashed, and then it can be directly merged.
For the intersect, first create a hash table of the left table data, then match the right table data by rows, and finally filter out the matching rows to be the output of the current partition.
For the except, you will also first create a hash table of the left table data, then match the right table data by rows, and finally filter out the unmatched rows as the output of the current partition.
EXCEPT ALL and INERSECT ALL are not supported in ms sqlServer or oracle, only in postgres, but the supported semantics are also ambiguous, so we do not support these two operations for the current version.

Sub Tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sql/compatibility Issues or PRs related to the SQL compatibililty proposal Categorizes an issue is a proposal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants