Various changes and fixes to UNNEST. #13892

gianm · 2023-03-07T12:23:22Z

Native changes:

UnnestDataSource: Replace "column" and "outputName" with "virtualColumn".
This enables pushing expressions into the datasource. This in turn
allows us to do the next thing...
UnnestStorageAdapter: Logically apply query-level filters and virtual
columns after the unnest operation. (Physically, filters are pushed down,
when possible.) This is beneficial because it allows filters and
virtual columns to reference the unnested column, and because it is
consistent with how the join datasource works.
Various documentation updates, including declaring "unnest" as an
experimental feature for now.

SQL changes:

Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel
is simplified: it only handles the UNNEST part of a correlated join.
Constant UNNESTs are handled with regular inline rels.
Rework DruidCorrelateUnnestRule to focus on pulling Projects from
the left side up above the Correlate. Additionally, fix issues related to
tracking correlation variables.New test testUnnestTwice verifies
that this works even when two UNNESTs are stacked on the same table.
Include ProjectCorrelateTransposeRule from Calcite to encourage
pushing mappings down below the left-hand side of the Correlate.
Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule
to handle pulling Filters up above the Correlate. New tests
testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify
this behavior.
Require a context feature flag for SQL UNNEST, since it's undocumented.
As part of this, also cleaned up how we handle feature flags in SQL.
They're now hooked into EngineFeatures, which is useful because not
all engines support all features.

Native changes: 1) UnnestDataSource: Replace "column" and "outputName" with "virtualColumn". This enables pushing expressions into the datasource. This in turn allows us to do the next thing... 2) UnnestStorageAdapter: Logically apply query-level filters and virtual columns after the unnest operation. (Physically, filters are pulled up, when possible.) This is beneficial because it allows filters and virtual columns to reference the unnested column, and because it is consistent with how the join datasource works. 3) Various documentation updates, including declaring "unnest" as an experimental feature for now. SQL changes: 1) Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel is simplified: it only handles the UNNEST part of a correlated join. Constant UNNESTs are handled with regular inline rels. 2) Rework DruidCorrelateUnnestRule to focus on pulling Projects from the left side up above the Correlate. New test testUnnestTwice verifies that this works even when two UNNESTs are stacked on the same table. 3) Include ProjectCorrelateTransposeRule from Calcite to encourage pushing mappings down below the left-hand side of the Correlate. 4) Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule to handle pulling Filters up above the Correlate. New tests testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify this behavior. 5) Require a context feature flag for SQL UNNEST, since it's undocumented. As part of this, also cleaned up how we handle feature flags in SQL. They're now hooked into EngineFeatures, which is useful because not all engines support all features.

clintropolis

this seems nice to me, and fixes the problem I was having using unnest with array columns in #13803

clintropolis · 2023-03-07T13:04:10Z

sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidCorrelateUnnestRule.java

- *  thereby reducing the logical plan to:
+ * {@link DruidUnnestRule} takes care of the Uncollect(last 3 lines) to generate a {@link DruidUnnestRel}
+ * thereby reducing the logical plan to:
+ * <pre>
 *        LogicalCorrelate
 *           /       \
 *      DruidRel    DruidUnnestDataSourceRel


nit: this javadoc needs updated i suppose

Yep. Updated it.

clintropolis · 2023-03-07T13:17:59Z

sql/src/main/java/org/apache/druid/sql/calcite/expression/builtin/CastOperatorConversion.java

@@ -76,6 +76,8 @@ public class CastOperatorConversion implements SqlOperatorConversion
      builder.put(type, ExprType.LONG);
    }

+    builder.put(SqlTypeName.ARRAY, ExprType.ARRAY);


this isn't probably that cool because it loses the array element type. I think maybe #13890 has a better solution for this by switching cast to use Calcites.getColumnTypeForRelDataType

Yeah, yours is better. Merged it in. Thanks.

somu-imply

I have a general Q. These 2 queries have the same plan. Both are unnest over a TableDataSource.

SELECT d3 FROM numFoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where dim2 IN ('a','ab')
SELECT d3 FROM (select * from numFoo where dim2 IN ('a','ab')), UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3)

But these 2 queries have different plans, the selector filter inside the table for the first query below plans as an unnest over a QueryDataSource while the second one is an Unnest over a TableDataSource

SELECT d3 FROM (select * from "numFoo" where dim2='a'), UNNEST(MV_TO_ARRAY(dim3)) as unnested(d3) 
SELECT d3 FROM numFoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where dim2='a'

somu-imply · 2023-03-07T20:39:38Z

processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java

                outputColumnName,
                allowSet
            );
          }
-          return retVal;
+          return PostJoinCursor.wrap(


This is nice, there's no need to pass in the allowFilter inside the data source now and the PostJoinCursor will apply the filters on unnested column which are the post join filters

somu-imply · 2023-03-07T23:04:45Z

I think the reason is that there is an additional level of LogicalProject between Correlate and LogicalFilter in the query where SELECT d3 FROM (select * from druid.numfoo where dim2='a'), UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) it plans to

136:LogicalProject(d3=[$17])
  134:LogicalCorrelate(subset=[rel#135:Subset#7.NONE.[]], correlation=[$cor0], joinType=[inner], requiredColumns=[{3}])
    125:LogicalProject(subset=[rel#126:Subset#2.NONE.[]], __time=[$0], dim1=[$1], dim2=[CAST('a':VARCHAR):VARCHAR], dim3=[$3], dim4=[$4], dim5=[$5], dim6=[$6], d1=[$7], d2=[$8], f1=[$9], f2=[$10], l1=[$11], l2=[$12], cnt=[$13], m1=[$14], m2=[$15], unique_dim1=[$16])
      123:LogicalFilter(subset=[rel#124:Subset#1.NONE.[]], condition=[=($2, 'a')])
        9:LogicalTableScan(subset=[rel#122:Subset#0.NONE.[]], table=[[druid, numfoo]])
    130:Uncollect(subset=[rel#131:Subset#5.NONE.[]])
      128:LogicalProject(subset=[rel#129:Subset#4.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
        12:LogicalValues(subset=[rel#127:Subset#3.NONE.[0]], tuples=[[{ 0 }]])

which does not match the CorrelateFilterTranspose rule (L or R). We might need to address that but can be done with a later PR as well. While the 2nd query SELECT d3 FROM numFoo, UNNEST(MV_TO_ARRAY(dim3)) as unnested (d3) where dim2='a' plans to

121:LogicalProject(d3=[$17])
  119:LogicalCorrelate(subset=[rel#120:Subset#6.NONE.[]], correlation=[$cor0], joinType=[inner], requiredColumns=[{3}])
    110:LogicalFilter(subset=[rel#111:Subset#1.NONE.[]], condition=[=($2, 'a')])
      8:LogicalTableScan(subset=[rel#109:Subset#0.NONE.[]], table=[[druid, numfoo]])
    115:Uncollect(subset=[rel#116:Subset#4.NONE.[]])
      113:LogicalProject(subset=[rel#114:Subset#3.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
        9:LogicalValues(subset=[rel#112:Subset#2.NONE.[0]], tuples=[[{ 0 }]])

For that extra level of LogicalProject in between #13799 had an additional rule https://github.com/apache/druid/pull/13799/files#diff-048f881d1285568ceba4eb4e798527937950063d472d4eb2623bb95cd8b86758R87. Since the Rels have changed that might not be directly applicable

somu-imply

Overall LGTM ! We can merge this and then I can fix up the boundary conditions

somu-imply · 2023-03-08T20:20:15Z

sql/src/test/java/org/apache/druid/sql/calcite/CalciteArraysQueryTest.java

+            new Object[]{"10.1", ImmutableList.of("b", "c"), ImmutableList.of("10", "1"), "1", "bxx"},
+            new Object[]{"10.1", ImmutableList.of("b", "c"), ImmutableList.of("10", "1"), "1", "cxx"},
+            new Object[]{"2", ImmutableList.of("d"), ImmutableList.of("2"), "2", "dxx"},
+            new Object[]{"1", null, ImmutableList.of("1"), "1", "xx"}


Replace with new Object[]{"1", useDefault ? null: ImmutableList.of(""), ImmutableList.of("1"), "1", "xx"}

gianm · 2023-03-09T11:13:11Z

But these 2 queries have different plans, the selector filter inside the table for the first query below plans as an unnest over a QueryDataSource while the second one is an Unnest over a TableDataSource

Looks like you're right about the extra Project being the reason. It prevents the filter from being pulled out since it's in the way. One approach we could take here is make a rule that recognizes the situation, and pulls the Project and Filter both out.

Thanks for taking a look. I pushed up fixes to the merge conflicts and the one test you pointed out. I also simplified DruidUnnestRel— in the latest patch it only contains a single RexNode. Finally I also expanded the filter-rewriting logic in UnnestStorageAdapter so it rewrites certain filters to use the input column of the unnest rather than the output column. It's limited to string types and non-expression filters, but it's s start. This piece could be improved in the future.

gianm · 2023-03-09T17:39:54Z

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidUnnestRel.java

+ * input table, this rel resolves the unnest part and delegates the rel to be consumed by other
+ * rule ({@link org.apache.druid.sql.calcite.rule.DruidCorrelateUnnestRule}
+ */
+public class DruidUnnestRel extends DruidRel<DruidUnnestRel>


Oh, I didn't realize RelNode implements Cloneable. This hasn't been an issue for any of the other DruidRels; none of them implement clone. Calcite's builtin rels don't either, except for MutableRel, which isn't involved here. I think it's fine to not implement this. I'll add an impl to DruidRel that throws an exception and write a comment remarking that it isn't needed.

clintropolis

lgtm, 👍 after CI

clintropolis · 2023-03-09T12:21:50Z

processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java

+          // Try to move filter pre-correlate if possible.
+          final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
+          if (newFilter != null) {
+            preFilters.add(newFilter);
+          } else {
+            postFilters.add(filter);
+          }


somu-imply · 2023-03-09T19:19:53Z

Thanks, there is also an unused import in DruidQueryRel which would also come in the checkstyle fix.

clintropolis · 2023-03-09T20:30:37Z

processing/src/main/java/org/apache/druid/segment/UnnestStorageAdapter.java

+          // Try to move filter pre-correlate if possible.
+          final Filter newFilter = rewriteFilterOnUnnestColumnIfPossible(filter, inputColumn, inputColumnCapabilites);
+          if (newFilter != null) {
+            preFilters.add(newFilter);


actually I think we want to always add the filter as a post filter even when we push down since the pushed down filter returns the whole row if any values match, so we want the value matcher to clean it up to exact matches

Let's merge this once CI passes, I'll fix them up. I understand the case where it returns the full row even for a match in MVDs. I'll also need to remove the allowSet and minor other changes. Will followup as soon as this gets merged

Oops, thanks for pointing this out. I've written a patch to fix this. In the interests of having this PR pass CI so @somu-imply can keep working on top of it, I'm ok to commit this PR first, and then do another PR fixing it.

Here's the patch https://gist.github.com/gianm/86d55fb6e5ee58c935152761a9df4a4d. Feel free to roll this into what you're doing @somu-imply.

PR with the fix is at: #13919. If you want to incorporate this into your own work then feel free and I will close my PR.

Thanks, I'll add this up

somu-imply · 2023-03-09T23:36:03Z

sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidRel.java

+  protected Object clone() throws CloneNotSupportedException
+  {
+    // RelNode implements Cloneable, but our class of rels is not cloned, so does not need to implement clone().
+    throw new UnsupportedOperationException();


IntelliJ inspections is flagging it as it does not throw a CloneNotSupportedException

somu-imply · 2023-03-09T23:39:04Z

Additionally there is a CI failure for this test testGroupByFloatMinExpressionVsVirtualColumnWithExplicitStringVirtualColumnTypedInput.

The problem seems to trigger from running with -Ddruid.generic.useDefaultValueForNull=false. The fix for this is to change the lines

"minVc",
NullHandling.replaceWithDefault() ? Float.POSITIVE_INFINITY : null

with

"minVc",
Float.POSITIVE_INFINITY

Although I am unsure what caused this to change

gianm · 2023-03-10T01:57:25Z

Although I am unsure what caused this to change

Ah. That's because ExpressionVirtualColumn now becomes a pass-through in cases where the expression doesn't do anything. So the behavior of min and minVc are now the same. I updated the test case to reflect the new expected behavior.

github-actions bot added the Area - Documentation label Mar 7, 2023

gianm mentioned this pull request Mar 7, 2023

Now unnest allows bound, in and selector filters on the unnested column #13799

Closed

10 tasks

clintropolis reviewed Mar 7, 2023

View reviewed changes

somu-imply reviewed Mar 7, 2023

View reviewed changes

somu-imply approved these changes Mar 8, 2023

View reviewed changes

somu-imply reviewed Mar 8, 2023

View reviewed changes

gianm added 2 commits March 9, 2023 01:11

Merge branch 'master' into unnest-updates

915d7a6

Simplify DruidUnnestRel.

9e7abfc

Fixes, simplification, additional filter pushdown.

cab501f

github-advanced-security bot found potential problems Mar 9, 2023

View reviewed changes

clintropolis approved these changes Mar 9, 2023

View reviewed changes

gianm added 2 commits March 9, 2023 09:31

Updates from review.

3f1b226

Style adjustment.

b32a20b

Fix unused import.

d576978

clintropolis reviewed Mar 9, 2023

View reviewed changes

somu-imply reviewed Mar 9, 2023

View reviewed changes

gianm added 2 commits March 9, 2023 17:48

Merge branch 'master' into unnest-updates

81fd59c

Update test.

4d5908f

gianm added 2 commits March 9, 2023 20:36

Fix test.

5fce9cc

Fix exception.

066e611

abhishekagarwal87 merged commit 4b1ffbc into apache:master Mar 10, 2023

gianm deleted the unnest-updates branch March 10, 2023 16:04

gianm mentioned this pull request Mar 10, 2023

Fix filter pushdown on unnest column: need to filter twice. #13919

Closed

clintropolis added this to the 26.0 milestone Apr 10, 2023

techdocsmith mentioned this pull request Apr 17, 2023

[DRAFT] 26.0.0 release notes #14064

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various changes and fixes to UNNEST. #13892

Various changes and fixes to UNNEST. #13892

gianm commented Mar 7, 2023 •

edited

Loading

clintropolis left a comment

clintropolis Mar 7, 2023

gianm Mar 8, 2023

clintropolis Mar 7, 2023

gianm Mar 8, 2023

somu-imply left a comment •

edited

Loading

somu-imply Mar 7, 2023

somu-imply commented Mar 7, 2023 •

edited

Loading

somu-imply left a comment

somu-imply Mar 8, 2023

gianm commented Mar 9, 2023

gianm Mar 9, 2023

clintropolis left a comment

clintropolis Mar 9, 2023

somu-imply commented Mar 9, 2023

clintropolis Mar 9, 2023

somu-imply Mar 9, 2023

gianm Mar 10, 2023

gianm Mar 10, 2023

gianm Mar 10, 2023

somu-imply Mar 10, 2023

somu-imply Mar 9, 2023

somu-imply commented Mar 9, 2023

gianm commented Mar 10, 2023

Various changes and fixes to UNNEST. #13892

Various changes and fixes to UNNEST. #13892

Conversation

gianm commented Mar 7, 2023 • edited Loading

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

somu-imply left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

somu-imply commented Mar 7, 2023 • edited Loading

somu-imply left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gianm commented Mar 9, 2023

Choose a reason for hiding this comment

clintropolis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

somu-imply commented Mar 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

somu-imply commented Mar 9, 2023

gianm commented Mar 10, 2023

gianm commented Mar 7, 2023 •

edited

Loading

somu-imply left a comment •

edited

Loading

somu-imply commented Mar 7, 2023 •

edited

Loading