-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various changes and fixes to UNNEST. #13892
Conversation
Native changes: 1) UnnestDataSource: Replace "column" and "outputName" with "virtualColumn". This enables pushing expressions into the datasource. This in turn allows us to do the next thing... 2) UnnestStorageAdapter: Logically apply query-level filters and virtual columns after the unnest operation. (Physically, filters are pulled up, when possible.) This is beneficial because it allows filters and virtual columns to reference the unnested column, and because it is consistent with how the join datasource works. 3) Various documentation updates, including declaring "unnest" as an experimental feature for now. SQL changes: 1) Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel is simplified: it only handles the UNNEST part of a correlated join. Constant UNNESTs are handled with regular inline rels. 2) Rework DruidCorrelateUnnestRule to focus on pulling Projects from the left side up above the Correlate. New test testUnnestTwice verifies that this works even when two UNNESTs are stacked on the same table. 3) Include ProjectCorrelateTransposeRule from Calcite to encourage pushing mappings down below the left-hand side of the Correlate. 4) Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule to handle pulling Filters up above the Correlate. New tests testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify this behavior. 5) Require a context feature flag for SQL UNNEST, since it's undocumented. As part of this, also cleaned up how we handle feature flags in SQL. They're now hooked into EngineFeatures, which is useful because not all engines support all features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems nice to me, and fixes the problem I was having using unnest with array columns in #13803
* thereby reducing the logical plan to: | ||
* {@link DruidUnnestRule} takes care of the Uncollect(last 3 lines) to generate a {@link DruidUnnestRel} | ||
* thereby reducing the logical plan to: | ||
* <pre> | ||
* LogicalCorrelate | ||
* / \ | ||
* DruidRel DruidUnnestDataSourceRel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this javadoc needs updated i suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Updated it.
@@ -76,6 +76,8 @@ public class CastOperatorConversion implements SqlOperatorConversion | |||
builder.put(type, ExprType.LONG); | |||
} | |||
|
|||
builder.put(SqlTypeName.ARRAY, ExprType.ARRAY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't probably that cool because it loses the array element type. I think maybe #13890 has a better solution for this by switching cast to use Calcites.getColumnTypeForRelDataType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, yours is better. Merged it in. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a general Q. These 2 queries have the same plan. Both are unnest over a TableDataSource.
SELECT d3 FROM numFoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where dim2 IN ('a','ab')
SELECT d3 FROM (select * from numFoo where dim2 IN ('a','ab')), UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3)
But these 2 queries have different plans, the selector filter inside the table for the first query below plans as an unnest over a QueryDataSource while the second one is an Unnest over a TableDataSource
SELECT d3 FROM (select * from "numFoo" where dim2='a'), UNNEST(MV_TO_ARRAY(dim3)) as unnested(d3)
SELECT d3 FROM numFoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where dim2='a'
outputColumnName, | ||
allowSet | ||
); | ||
} | ||
return retVal; | ||
return PostJoinCursor.wrap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice, there's no need to pass in the allowFilter inside the data source now and the PostJoinCursor will apply the filters on unnested column which are the post join filters
I think the reason is that there is an additional level of LogicalProject between Correlate and LogicalFilter in the query where
which does not match the CorrelateFilterTranspose rule (L or R). We might need to address that but can be done with a later PR as well. While the 2nd query
For that extra level of LogicalProject in between #13799 had an additional rule https://github.com/apache/druid/pull/13799/files#diff-048f881d1285568ceba4eb4e798527937950063d472d4eb2623bb95cd8b86758R87. Since the Rels have changed that might not be directly applicable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM ! We can merge this and then I can fix up the boundary conditions
new Object[]{"10.1", ImmutableList.of("b", "c"), ImmutableList.of("10", "1"), "1", "bxx"}, | ||
new Object[]{"10.1", ImmutableList.of("b", "c"), ImmutableList.of("10", "1"), "1", "cxx"}, | ||
new Object[]{"2", ImmutableList.of("d"), ImmutableList.of("2"), "2", "dxx"}, | ||
new Object[]{"1", null, ImmutableList.of("1"), "1", "xx"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace with new Object[]{"1", useDefault ? null: ImmutableList.of(""), ImmutableList.of("1"), "1", "xx"}
Looks like you're right about the extra Project being the reason. It prevents the filter from being pulled out since it's in the way. One approach we could take here is make a rule that recognizes the situation, and pulls the Project and Filter both out. Thanks for taking a look. I pushed up fixes to the merge conflicts and the one test you pointed out. I also simplified DruidUnnestRel— in the latest patch it only contains a single RexNode. Finally I also expanded the filter-rewriting logic in UnnestStorageAdapter so it rewrites certain filters to use the input column of the unnest rather than the output column. It's limited to string types and non-expression filters, but it's s start. This piece could be improved in the future. |
* input table, this rel resolves the unnest part and delegates the rel to be consumed by other | ||
* rule ({@link org.apache.druid.sql.calcite.rule.DruidCorrelateUnnestRule} | ||
*/ | ||
public class DruidUnnestRel extends DruidRel<DruidUnnestRel> |
Check failure
Code scanning / CodeQL
No clone method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't realize RelNode implements Cloneable. This hasn't been an issue for any of the other DruidRels; none of them implement clone
. Calcite's builtin rels don't either, except for MutableRel, which isn't involved here. I think it's fine to not implement this. I'll add an impl to DruidRel
that throws an exception and write a comment remarking that it isn't needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, 👍 after CI
// Try to move filter pre-correlate if possible. | ||
final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites); | ||
if (newFilter != null) { | ||
preFilters.add(newFilter); | ||
} else { | ||
postFilters.add(filter); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 👍
Thanks, there is also an unused import in |
// Try to move filter pre-correlate if possible. | ||
final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites); | ||
if (newFilter != null) { | ||
preFilters.add(newFilter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I think we want to always add the filter as a post filter even when we push down since the pushed down filter returns the whole row if any values match, so we want the value matcher to clean it up to exact matches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this once CI passes, I'll fix them up. I understand the case where it returns the full row even for a match in MVDs. I'll also need to remove the allowSet and minor other changes. Will followup as soon as this gets merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks for pointing this out. I've written a patch to fix this. In the interests of having this PR pass CI so @somu-imply can keep working on top of it, I'm ok to commit this PR first, and then do another PR fixing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the patch https://gist.github.com/gianm/86d55fb6e5ee58c935152761a9df4a4d. Feel free to roll this into what you're doing @somu-imply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR with the fix is at: #13919. If you want to incorporate this into your own work then feel free and I will close my PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll add this up
protected Object clone() throws CloneNotSupportedException | ||
{ | ||
// RelNode implements Cloneable, but our class of rels is not cloned, so does not need to implement clone(). | ||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntelliJ inspections is flagging it as it does not throw a CloneNotSupportedException
Additionally there is a CI failure for this test The problem seems to trigger from running with
with
Although I am unsure what caused this to change |
Ah. That's because ExpressionVirtualColumn now becomes a pass-through in cases where the expression doesn't do anything. So the behavior of |
Native changes:
UnnestDataSource: Replace "column" and "outputName" with "virtualColumn".
This enables pushing expressions into the datasource. This in turn
allows us to do the next thing...
UnnestStorageAdapter: Logically apply query-level filters and virtual
columns after the unnest operation. (Physically, filters are pushed down,
when possible.) This is beneficial because it allows filters and
virtual columns to reference the unnested column, and because it is
consistent with how the join datasource works.
Various documentation updates, including declaring "unnest" as an
experimental feature for now.
SQL changes:
Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel
is simplified: it only handles the UNNEST part of a correlated join.
Constant UNNESTs are handled with regular inline rels.
Rework DruidCorrelateUnnestRule to focus on pulling Projects from
the left side up above the Correlate. Additionally, fix issues related to
tracking correlation variables.New test testUnnestTwice verifies
that this works even when two UNNESTs are stacked on the same table.
Include ProjectCorrelateTransposeRule from Calcite to encourage
pushing mappings down below the left-hand side of the Correlate.
Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule
to handle pulling Filters up above the Correlate. New tests
testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify
this behavior.
Require a context feature flag for SQL UNNEST, since it's undocumented.
As part of this, also cleaned up how we handle feature flags in SQL.
They're now hooked into EngineFeatures, which is useful because not
all engines support all features.