Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Now unnest allows bound, in and selector filters on the unnested column #13799

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
e8de6dc
Now unnest allows bound, in and selector filters on the unnested column
somu-imply Feb 14, 2023
9743911
Filters on unnested columns over virtual columns
somu-imply Feb 15, 2023
933ccbe
Removing allowList to make way for filters
somu-imply Feb 15, 2023
e77643a
Support for filters on unnested column in SQL part 1
somu-imply Feb 21, 2023
ce5e5a6
Adding a test on virtual column and bound filters
somu-imply Feb 21, 2023
a2ca449
Removing an older test that is now supported through sql
somu-imply Feb 21, 2023
ff4d6ea
Fixing an issue with initialization in unnestDim cursor and adding ne…
somu-imply Feb 22, 2023
663292a
Adding support and unit tests for all kinds of selector filters over …
somu-imply Feb 22, 2023
f138c6d
Updating a md file for spellcheck to pass, although it is behaving fl…
somu-imply Feb 22, 2023
bb643ac
Merge remote-tracking branch 'upstream/master' into filtersOnUnnest
somu-imply Feb 23, 2023
e969e6a
Simplifying the advance logic for cursors
somu-imply Feb 27, 2023
5031f9f
Changes for moving filter inside the unnest data source
somu-imply Mar 4, 2023
737771b
Adding some comments and updating some javadocs
somu-imply Mar 4, 2023
744fb7b
Do not need planner context in the new rules, using an instance metho…
somu-imply Mar 4, 2023
0bb44c5
Tmp changes for filter matching
somu-imply Mar 7, 2023
c9074b6
Updating dimension cursor to use value matcher once and populate a bi…
somu-imply Mar 9, 2023
c617443
Merge branch 'master' into filtersOnUnnest
somu-imply Mar 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/tutorials/tutorial-jupyter-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Make sure you meet the following requirements before starting the Jupyter-based
pip3 install requests
```


- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid
and Jupyter both try to use port `8888`, so start Jupyter on a different port.

Expand Down Expand Up @@ -99,10 +100,12 @@ The notebooks are located in the [apache/druid repo](https://github.com/apache/d

The links that follow are the raw GitHub URLs, so you can use them to download the notebook directly, such as with `wget`, or manually through your web browser. Note that if you save the file from your web browser, make sure to remove the `.txt` extension.


- [Introduction to the Druid REST API](
https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/api-tutorial.ipynb)
walks you through some of the basics related to the Druid REST API and several endpoints.
- [Introduction to the Druid Python API](
https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/Python_API_Tutorial.ipynb)
walks you through some of the basics related to the Druid API using the Python wrapper API.

- [Introduction to Druid SQL](https://raw.githubusercontent.com/apache/druid/master/examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb) covers the basics of Druid SQL.
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@
import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.common.collect.ImmutableList;
import org.apache.druid.java.util.common.IAE;
import org.apache.druid.query.filter.DimFilter;
import org.apache.druid.query.planning.DataSourceAnalysis;
import org.apache.druid.segment.SegmentReference;
import org.apache.druid.segment.UnnestSegmentReference;
import org.apache.druid.utils.JvmUtils;

import javax.annotation.Nullable;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Objects;
import java.util.Set;
Expand All @@ -38,7 +38,7 @@

/**
* The data source for representing an unnest operation.
*
* <p>
* An unnest data source has the following:
* a base data source which is to be unnested
* the column name of the MVD which will be unnested
Expand All @@ -50,30 +50,30 @@ public class UnnestDataSource implements DataSource
private final DataSource base;
private final String column;
private final String outputName;
private final LinkedHashSet<String> allowList;
private final DimFilter outputColumnFilter;

private UnnestDataSource(
DataSource dataSource,
String columnName,
String outputName,
LinkedHashSet<String> allowList
DimFilter outputColumnFilter
)
{
this.base = dataSource;
this.column = columnName;
this.outputName = outputName;
this.allowList = allowList;
this.outputColumnFilter = outputColumnFilter;
}

@JsonCreator
public static UnnestDataSource create(
@JsonProperty("base") DataSource base,
@JsonProperty("column") String columnName,
@JsonProperty("outputName") String outputName,
@Nullable @JsonProperty("allowList") LinkedHashSet<String> allowList
@Nullable @JsonProperty("outputColumnFilter") DimFilter outputColumnFilter
)
{
return new UnnestDataSource(base, columnName, outputName, allowList);
return new UnnestDataSource(base, columnName, outputName, outputColumnFilter);
}

@JsonProperty("base")
Expand All @@ -94,12 +94,11 @@ public String getOutputName()
return outputName;
}

@JsonProperty("allowList")
public LinkedHashSet<String> getAllowList()
@JsonProperty("outputColumnFilter")
public DimFilter getOutputColumnFilter()
{
return allowList;
return outputColumnFilter;
}

@Override
public Set<String> getTableNames()
{
Expand All @@ -118,7 +117,7 @@ public DataSource withChildren(List<DataSource> children)
if (children.size() != 1) {
throw new IAE("Expected [1] child, got [%d]", children.size());
}
return new UnnestDataSource(children.get(0), column, outputName, allowList);
return new UnnestDataSource(children.get(0), column, outputName, outputColumnFilter);
}

@Override
Expand Down Expand Up @@ -163,7 +162,7 @@ public Function<SegmentReference, SegmentReference> createSegmentMapFunction(
segmentMapFn.apply(baseSegment),
column,
outputName,
allowList
outputColumnFilter
);
}
}
Expand All @@ -174,7 +173,7 @@ public Function<SegmentReference, SegmentReference> createSegmentMapFunction(
@Override
public DataSource withUpdatedDataSource(DataSource newSource)
{
return new UnnestDataSource(newSource, column, outputName, allowList);
return new UnnestDataSource(newSource, column, outputName, outputColumnFilter);
}

@Override
Expand Down Expand Up @@ -213,7 +212,7 @@ public boolean equals(Object o)
@Override
public int hashCode()
{
return Objects.hash(base, column, outputName);
return Objects.hash(base, column, outputName, outputColumnFilter);
}

@Override
Expand All @@ -223,7 +222,7 @@ public String toString()
"base=" + base +
", column='" + column + '\'' +
", outputName='" + outputName + '\'' +
", allowList=" + allowList +
", outputFilter='" + outputColumnFilter + '\'' +
'}';
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,17 @@
import org.apache.druid.java.util.common.UOE;
import org.apache.druid.query.BaseQuery;
import org.apache.druid.query.dimension.DimensionSpec;
import org.apache.druid.query.filter.Filter;
import org.apache.druid.query.filter.ValueMatcher;
import org.apache.druid.query.monomorphicprocessing.RuntimeShapeInspector;
import org.apache.druid.segment.column.ColumnCapabilities;
import org.apache.druid.segment.column.ColumnCapabilitiesImpl;
import org.apache.druid.segment.filter.BooleanValueMatcher;
import org.joda.time.DateTime;

import javax.annotation.Nullable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.LinkedHashSet;
import java.util.List;

/**
Expand All @@ -50,7 +52,7 @@
* unnestCursor.advance() -> 'e'
* <p>
* <p>
* The allowSet if available helps skip over elements which are not in the allowList by moving the cursor to
* The filter if available helps skip over elements which are not in the allowList by moving the cursor to
* the next available match.
* <p>
* The index reference points to the index of each row that the unnest cursor is accessing through currentVal
Expand All @@ -65,7 +67,7 @@ public class UnnestColumnValueSelectorCursor implements Cursor
private final ColumnValueSelector columnValueSelector;
private final String columnName;
private final String outputName;
private final LinkedHashSet<String> allowSet;
private final ValueMatcher valueMatcher;
private int index;
private Object currentVal;
private List<Object> unnestListForCurrentRow;
Expand All @@ -76,7 +78,7 @@ public UnnestColumnValueSelectorCursor(
ColumnSelectorFactory baseColumSelectorFactory,
String columnName,
String outputColumnName,
LinkedHashSet<String> allowSet
@Nullable Filter filter
)
{
this.baseCursor = cursor;
Expand All @@ -86,7 +88,11 @@ public UnnestColumnValueSelectorCursor(
this.index = 0;
this.outputName = outputColumnName;
this.needInitialization = true;
this.allowSet = allowSet;
if (filter != null) {
this.valueMatcher = filter.makeMatcher(getColumnSelectorFactory());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is called in the constructor and then also returned as a public method, maybe switch it around to have the constructor create the ColumnSelectorFactory, store the reference and then use that here/return it from the getColumnSelectorFactory method.

} else {
this.valueMatcher = BooleanValueMatcher.of(true);
}
}

@Override
Expand Down Expand Up @@ -191,11 +197,7 @@ public boolean isNull()
public Object getObject()
{
if (!unnestListForCurrentRow.isEmpty()) {
if (allowSet == null || allowSet.isEmpty()) {
return unnestListForCurrentRow.get(index);
} else if (allowSet.contains((String) unnestListForCurrentRow.get(index))) {
return unnestListForCurrentRow.get(index);
}
return unnestListForCurrentRow.get(index);
}
return null;
}
Expand Down Expand Up @@ -243,9 +245,17 @@ public void advance()
@Override
public void advanceUninterruptibly()
{
do {
// the index currently points to an element at position i ($e_i)
// while $e_i does not match the filter
// keep advancing the pointer to the first valid match
// calling advanceAndUpdate increments the index position so needs a call to matches() again for new match status
while (true) {
advanceAndUpdate();
} while (matchAndProceed());
boolean match = valueMatcher.matches();
if (match || baseCursor.isDone()) {
return;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

  while (true) {
    boolean match = valueMatcher.matches();
    advanceAndUpdate();
    if (match || baseCursor.isDone()) {
        return;
    }
  } 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this should be

while (true) {
      advanceAndUpdate();
      boolean match = valueMatcher.matches();
      if (match || baseCursor.isDone()) {
        return;
      }
    }

The match changes after the update so the call should be below. I'll add it

}

@Override
Expand Down Expand Up @@ -311,12 +321,11 @@ private void initialize()
{
this.unnestListForCurrentRow = new ArrayList<>();
getNextRow(needInitialization);
if (allowSet != null) {
if (!allowSet.isEmpty()) {
if (!allowSet.contains((String) unnestListForCurrentRow.get(index))) {
advance();
}
}

// If the first value the index is pointing to does not match the filter
// advance the index to the first value which will match
if (!valueMatcher.matches() && !baseCursor.isDone()) {
advance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant. advanceUninterruptibly() already does the needed checks.

}
needInitialization = false;
}
Expand All @@ -339,22 +348,4 @@ private void advanceAndUpdate()
index++;
}
}

/**
* This advances the unnest cursor in cases where an allowList is specified
* and the current value at the unnest cursor is not in the allowList.
* The cursor in such cases is moved till the next match is found.
*
* @return a boolean to indicate whether to stay or move cursor
*/
private boolean matchAndProceed()
{
boolean matchStatus;
if (allowSet == null || allowSet.isEmpty()) {
matchStatus = true;
} else {
matchStatus = allowSet.contains((String) unnestListForCurrentRow.get(index));
}
return !baseCursor.isDone() && !matchStatus;
}
}
Loading