[Proposal] Full support of set operations: add INTERSECT and EXCEPT set operators #2834
Labels
area/sql/compatibility
Issues or PRs related to the SQL compatibililty
proposal
Categorizes an issue is a proposal
Description
Generaly set functionality includes the below. but now doris has just support UNION & UNION ALL.
EXCEPT
in some systems is an alias forMINUS
INTERSECT
operator is used to return the results of 2 or moreSELECT
statements. However, it only returns the rows selected by all queries or data sets. If a record exists in one query and not in the other, it will be omitted from theINTERSECT
results.The
EXCEPT
operator is used to return all rows in the firstSELECT
statement that are not returned by the secondSELECT
statement. EachSELECT
statement will define a dataset. TheEXCEPT
operator will retrieve all records from the first dataset and then remove from the results all records from the second dataset.Syntax
INTERSECT
EXCEPT
Design
All data is needed to implement the intersect and except operations. An optimized method is to use hash shuffle, so that the data processed by each node is hashed, and then it can be directly merged.
For the intersect, first create a hash table of the left table data, then match the right table data by rows, and finally filter out the matching rows to be the output of the current partition.
For the except, you will also first create a hash table of the left table data, then match the right table data by rows, and finally filter out the unmatched rows as the output of the current partition.
EXCEPT ALL and INERSECT ALL are not supported in ms sqlServer or oracle, only in postgres, but the supported semantics are also ambiguous, so we do not support these two operations for the current version.
Sub Tasks
The text was updated successfully, but these errors were encountered: