-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPI and optimizer rule for connectors that can support complete topN … #4249
Conversation
I have made the assumption that a connector that states it can support complete topN will guarantee N or less result, if we think that is too strict a requirement we can do something similar to limit pushdown. |
87f81a0
to
eb2f41c
Compare
9ec82c0
to
807e383
Compare
presto-spi/src/main/java/io/prestosql/spi/connector/TopNFunction.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/io/prestosql/spi/connector/TopNFunction.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
presto-spi/src/main/java/io/prestosql/spi/connector/ConnectorMetadata.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/connector/MockConnectorFactory.java
Outdated
Show resolved
Hide resolved
presto-main/src/test/java/io/prestosql/connector/MockConnectorFactory.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PushTopNIntoTableScan.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PushTopNIntoTableScan.java
Outdated
Show resolved
Hide resolved
|
||
import static java.util.Objects.requireNonNull; | ||
|
||
public class TopN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a dedicated class for this seems overkill. Every place that uses this has immediate access to the count and SortItems from a TopNNode. The SPI method can just take both fields separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the class and added topNCount and sortItems as just arguments.
c845a4e
to
27a14f2
Compare
* If the connector can handle TopN Pushdown it should return a new table handle which will replace the existing | ||
* table handle in the TableScan. | ||
*/ | ||
default Optional<ConnectorTableHandle> applyTopN( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want to return a structure similar to applyLimit
, where it indicates whether the limit is guaranteed by the connector. In the case of a distributed connector implementation, it may not be able to perform a global top N, but it could still benefit from pushing down the operation, for instance, to each individual storage shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added TopNApplicationResult which is basically same as LimitApplicationResult for now and added a test case that tests that when limitGuaranteed is set to false we create a plan with TopNPartial stage with updated connector table handle reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martint can you review this again when you find some time?
tableScan.getOutputSymbols(), | ||
tableScan.getAssignments())); | ||
if (!result.isTopNGuaranteed()) { | ||
node = new TopNNode(topNNode.getId(), node, topNNode.getCount(), topNNode.getOrderingScheme(), TopNNode.Step.PARTIAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should be Step.FINAL
. The partial Top N is being handled by the connector at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed up here: #4249 (comment)
private static final Pattern<TopNNode> PATTERN = topN().with(source().matching( | ||
tableScan().capturedAs(TABLE_SCAN))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should trigger no SINGLE step only.
We should not push partial and final separately (and IMO -- we should not push them at all)
@@ -623,7 +624,8 @@ public PlanOptimizers( | |||
new CreatePartialTopN(), | |||
new PushTopNThroughProject(), | |||
new PushTopNThroughOuterJoin(), | |||
new PushTopNThroughUnion()))); | |||
new PushTopNThroughUnion(), | |||
new PushTopNIntoTableScan(metadata)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added to pushIntoTableScanOptimizer
, when other PushXxxIntoTableScan
rules are registered.
This is important for two reasons:
- this will unlock other pushdowns once TopN is pushed into TableScan fully
- this will let the rule operate on TopNNode while it is SINGLE step (and consume it fully), making reasoning about engine-connector interactions simpler
tableScan.getAssignments()); | ||
|
||
if (!result.isTopNGuaranteed()) { | ||
node = new TopNNode(topNNode.getId(), node, topNNode.getCount(), topNNode.getOrderingScheme(), TopNNode.Step.FINAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relates to #4249 (comment)
Inserting FINAL step is not correct when the rule triggered on TopNNode with Step PARTIAL.
I suggest
- make the rule trigger for SINGLE step only (my preferred)
- i am aware sibling
PushLimitIntoTableScan
triggers for partial limits, but i am not convinced it is beneficial
- i am aware sibling
- keep same step as it used to be used (basically, use
topNNode.replaceChildren
)
cc @losipiuk |
…pushdown