Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL version of unnest native druid function #13576

Merged
merged 30 commits into from
Jan 23, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
2e72fd3
Some changes for sql unnest
somu-imply Dec 12, 2022
812133f
Updating partial query
somu-imply Dec 13, 2022
80846d3
Updating a test case
somu-imply Dec 14, 2022
828bf95
Adding select project of left to top of correlate
somu-imply Dec 14, 2022
2a308be
Handling the correlate data type check to not blow up
somu-imply Dec 15, 2022
cce6aa8
temp
somu-imply Dec 15, 2022
f0ac1a3
Working version of sql unnest
somu-imply Dec 15, 2022
6385f16
temp changes to debug virtual column creation
somu-imply Dec 17, 2022
92e161a
Adding sql support for virtual columns
somu-imply Dec 21, 2022
6767913
Fixing some test cases
somu-imply Dec 21, 2022
de943e9
More test cases and some changes to support virtual columns and null …
somu-imply Jan 4, 2023
ff561dd
fixing group by on unnested virtual columns
somu-imply Jan 9, 2023
fbb5d5b
Making constructor for correlate slimmer
somu-imply Jan 11, 2023
f820307
Fixed for grouping by on virtual columns for unnest
somu-imply Jan 11, 2023
94f47a5
Refactoring code to use abstract class correlate instead of LogicalCo…
somu-imply Jan 12, 2023
3afd027
adding explain terms to fix the limit issue
somu-imply Jan 12, 2023
6356ba3
test case for unnest on top of a join datasource
somu-imply Jan 12, 2023
8a854dc
Adding javadocs and fixing one test
somu-imply Jan 13, 2023
4f99571
Removing two unused variables and one speel check
somu-imply Jan 13, 2023
0376170
Test with grouping on virtual column
somu-imply Jan 17, 2023
a1e0ea8
Fixing a test case to generate more line coverage
somu-imply Jan 18, 2023
06b3811
Addressing comments part 1
somu-imply Jan 19, 2023
8c03d38
Merge remote-tracking branch 'upstream/master' into sql_unnnest_error…
somu-imply Jan 19, 2023
2bb22fa
Updating the misc.xml
somu-imply Jan 19, 2023
df05cf4
Addressing review comments part 2
somu-imply Jan 19, 2023
533be68
Fixing one code comment
somu-imply Jan 19, 2023
16c750c
spotbugs fix
somu-imply Jan 19, 2023
bf0e27c
Removing throws clause in the clone method
somu-imply Jan 20, 2023
301cbaa
One last fix to reuse a columnName
somu-imply Jan 20, 2023
5320349
Updating the field expression to catch missing index and not pursuing…
somu-imply Jan 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ private static boolean rowsEqual(final Iterable<Object[]> rowsA, final Iterable<
final Object[] rowA = listA.get(i);
final Object[] rowB = listB.get(i);

if (!Arrays.equals(rowA, rowB)) {
if (!Arrays.deepEquals(rowA, rowB)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a hashCode method that claims that it is compatible with this method. I'm not sure that is actually true anymore because I don't believe it walks into Arrays-of-Arrays the same way that this one does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change got percolated here, I'll move to the equals here

return false;
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,18 @@ public int hashCode()
{
return Objects.hash(base, column, outputName);
}

@Override
public String toString()
{
return "UnnestDataSource{" +
"base=" + base +
", column='" + column + '\'' +
", outputName='" + outputName + '\'' +
", allowList=" + allowList +
'}';
}

}


Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,31 @@ public DimensionSelector makeDimensionSelector(DimensionSpec dimensionSpec)
if (!outputName.equals(dimensionSpec.getDimension())) {
return baseColumnSelectorFactory.makeDimensionSelector(dimensionSpec);
}
throw new UOE("Unsupported dimension selector while using column value selector for column [%s]", outputName);
// this is done to support virtual columns
// In future a developer should move towards making sure that
// for all dictionary encoded cases we only get the dimension selector
return new BaseSingleValueDimensionSelector()
{
final ColumnValueSelector colSelector = makeColumnValueSelector(dimensionSpec.getDimension());

@Nullable
@Override
protected String getValue()
{
final Object returnedObj = colSelector.getObject();
if (returnedObj == null) {
return null;
} else {
return String.valueOf(returnedObj);
}
}

@Override
public void inspectRuntimeShape(RuntimeShapeInspector inspector)
{
colSelector.inspectRuntimeShape(inspector);
}
};
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,10 @@ public void inspectRuntimeShape(RuntimeShapeInspector inspector)
@Override
public Object getObject()
{
if (indexedIntsForCurrentRow == null) {
if (dimSelector.getObject() == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we switch away from using the stored reference? getObject() can do work, work we've already done, doing the same work multiple times is bad.

It looks to me like you just needed to switch it to be

if (indexedIntsForCurrentRow == null || indexedIntsForCurrentRow.size() == 0)

And then, you could perhaps make it even simpler if you check the size once when setting indexedIntsForCurrentRow and set to null when size is 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving to stored reference and setting the indexedIntsForCurrentRow to null at the time of assignment if there are no elements

return null;
}
if (indexedIntsForCurrentRow.size() == 0) {
return null;
}
if (allowedBitSet.isEmpty()) {
Expand Down Expand Up @@ -409,7 +412,12 @@ public int size()
@Override
public int get(int idx)
{
return indexedIntsForCurrentRow.get(index);
// to support rows which have only null values
// need to check if rhe value is not null and the size is greater than 0
if (indexedIntsForCurrentRow != null && indexedIntsForCurrentRow.size() > 0) {
return indexedIntsForCurrentRow.get(index);
}
return 0;
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import org.apache.calcite.jdbc.JavaTypeFactoryImpl;
import org.apache.calcite.rel.core.Project;
import org.apache.calcite.rex.RexCall;
import org.apache.calcite.rex.RexFieldAccess;
import org.apache.calcite.rex.RexInputRef;
import org.apache.calcite.rex.RexLiteral;
import org.apache.calcite.rex.RexNode;
Expand All @@ -34,6 +35,7 @@
import org.apache.druid.common.config.NullHandling;
import org.apache.druid.java.util.common.DateTimes;
import org.apache.druid.java.util.common.ISE;
import org.apache.druid.java.util.common.StringUtils;
import org.apache.druid.java.util.common.granularity.Granularity;
import org.apache.druid.math.expr.Expr;
import org.apache.druid.math.expr.ExprMacroTable;
Expand All @@ -58,6 +60,7 @@
import org.apache.druid.sql.calcite.filtration.Filtration;
import org.apache.druid.sql.calcite.planner.Calcites;
import org.apache.druid.sql.calcite.planner.PlannerContext;
import org.apache.druid.sql.calcite.rel.CannotBuildQueryException;
import org.apache.druid.sql.calcite.rel.VirtualColumnRegistry;
import org.apache.druid.sql.calcite.table.RowSignatures;
import org.joda.time.Interval;
Expand Down Expand Up @@ -214,12 +217,43 @@ public static DruidExpression toDruidExpressionWithPostAggOperands(
return rexCallToDruidExpression(plannerContext, rowSignature, rexNode, postAggregatorVisitor);
} else if (kind == SqlKind.LITERAL) {
return literalToDruidExpression(plannerContext, rexNode);
} else if (kind == SqlKind.FIELD_ACCESS) {
return fieldAccessToDruidExpression(rowSignature, rexNode);
} else {
// Can't translate.
return null;
}
}

private static DruidExpression fieldAccessToDruidExpression(
final RowSignature rowSignature,
final RexNode rexNode
)
{
// Translate field references.
final RexFieldAccess ref = (RexFieldAccess) rexNode;
// This case arises in the case of a correlation where the rexNode points to a table from the left subtree
// while the underlying datasource is the scan stub created from LogicalValuesRule
// In such a case we throw a CannotBuildQueryException so that Calcite does not go ahead with this path
// This exception is caught while returning false from isValidDruidQuery() method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this comment has been separated from its home. Is it inside of the if down below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to inside of the if


//use this index to return
final int index = rowSignature.getColumnNames().indexOf(ref.getField().getName());
if (ref.getField().getIndex() > rowSignature.size()) {
throw new CannotBuildQueryException(StringUtils.format(
"Cannot build query as column name [%s] does not exist in row [%s]", ref.getField().getName(), rowSignature)
);
}

final String columnName = rowSignature.getColumnName(index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we just search the rowSignature for the name? Why will we get a different name back than the one that we searched for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used the rexNodes name to find the index before. Now we are just using the column name of the row signature at the particular index

final Optional<ColumnType> columnType = rowSignature.getColumnType(index);
if (columnName == null) {
throw new ISE("Expression referred to nonexistent index [%d] in row [%s]", index, rowSignature);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this if block can ever be run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True we are ensuring the index is always within the rowSignature length, will remove this


return DruidExpression.ofColumn(columnType.orElse(null), columnName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we ever not have the columnType? Why do we need to orElse it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the index in bounds this should be just columnType.get()

}

private static DruidExpression inputRefToDruidExpression(
final RowSignature rowSignature,
final RexNode rexNode
Expand Down
Loading