Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TopN pushdown in JDBC connectors #6847

Merged
merged 4 commits into from
Mar 2, 2021

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Feb 8, 2021

Partially based on #4784.

The main difference is to use JdbcQueryRelationHandle, instead of carrying List<SortItem> in the JdbcTableHandle.

@cla-bot cla-bot bot added the cla-signed label Feb 8, 2021
@wendigo wendigo requested a review from findepi February 8, 2021 12:04
@wendigo
Copy link
Contributor Author

wendigo commented Feb 8, 2021

trino:public> set session postgresql."topn_pushdown_enabled" = false;
SET SESSION
trino:public> SELECT * FROM orders_big ORDER by totalprice DESC LIMIT 10;
 orderkey | custkey | orderstatus | totalprice | orderdate  |  orderpriority  |      clerk      | shippriority |                                    comment
----------+---------+-------------+------------+------------+-----------------+-----------------+--------------+-------------------------------------------------------------------------------
  1750466 |   21433 | F           |  555285.16 | 1992-11-30 | 4-NOT SPECIFIED | Clerk#000000040 |            0 |  ironic packages. even notornis integrate. fluffily regular foxes snooze furi
  4722021 |  128120 | F           |  544089.09 | 1994-04-07 | 1-URGENT        | Clerk#000000230 |            0 | al, express pinto beans are after the careful
  3043270 |  144617 | O           |  530604.44 | 1997-02-12 | 5-LOW           | Clerk#000000699 |            0 | riously final deposits?
  4576548 |  108931 | O           |  525590.57 | 1997-12-26 | 1-URGENT        | Clerk#000000336 |            0 | he slyly ironic requests. regular, bold deposits cajol
  2232932 |   13940 | O           |  522720.61 | 1997-04-13 | 2-HIGH          | Clerk#000000245 |            0 | lithely stealthy accounts are slyly against the pend
  3586919 |   24049 | F           |  522644.48 | 1992-11-07 | 1-URGENT        | Clerk#000000924 |            0 | are alongside of the pending deposits. quick
  2199712 |   66790 | O           |  515531.82 | 1996-09-30 | 2-HIGH          | Clerk#000000650 |            0 |  the final, ironic deposits inte
  2185667 |   51796 | F           |  511359.88 | 1992-10-08 | 1-URGENT        | Clerk#000000574 |            0 | . deposits wake quickly unusual deposits. express dependencies wake.
  4515876 |  100685 | F           |   510061.6 | 1993-11-02 | 4-NOT SPECIFIED | Clerk#000000105 |            0 | carefully accounts: slyly ironic pinto beans detect. brave somas a
   972901 |  141100 | F           |  508668.52 | 1992-07-18 | 3-MEDIUM        | Clerk#000000667 |            0 | pinto beans according to the blithely express dolphins cajo
(10 rows)

Query 20210208_123038_00029_3ip8n, FINISHED, 2 nodes
Splits: 6 total, 6 done (100.00%)
12.27 [1.5M rows, 0B] [122K rows/s, 0B/s]

trino:public> set session postgresql."topn_pushdown_enabled" = true;
SET SESSION
trino:public> SELECT * FROM orders_big ORDER by totalprice DESC LIMIT 10;
 orderkey | custkey | orderstatus | totalprice | orderdate  |  orderpriority  |      clerk      | shippriority |                                    comment
----------+---------+-------------+------------+------------+-----------------+-----------------+--------------+-------------------------------------------------------------------------------
  1750466 |   21433 | F           |  555285.16 | 1992-11-30 | 4-NOT SPECIFIED | Clerk#000000040 |            0 |  ironic packages. even notornis integrate. fluffily regular foxes snooze furi
  4722021 |  128120 | F           |  544089.09 | 1994-04-07 | 1-URGENT        | Clerk#000000230 |            0 | al, express pinto beans are after the careful
  3043270 |  144617 | O           |  530604.44 | 1997-02-12 | 5-LOW           | Clerk#000000699 |            0 | riously final deposits?
  4576548 |  108931 | O           |  525590.57 | 1997-12-26 | 1-URGENT        | Clerk#000000336 |            0 | he slyly ironic requests. regular, bold deposits cajol
  2232932 |   13940 | O           |  522720.61 | 1997-04-13 | 2-HIGH          | Clerk#000000245 |            0 | lithely stealthy accounts are slyly against the pend
  3586919 |   24049 | F           |  522644.48 | 1992-11-07 | 1-URGENT        | Clerk#000000924 |            0 | are alongside of the pending deposits. quick
  2199712 |   66790 | O           |  515531.82 | 1996-09-30 | 2-HIGH          | Clerk#000000650 |            0 |  the final, ironic deposits inte
  2185667 |   51796 | F           |  511359.88 | 1992-10-08 | 1-URGENT        | Clerk#000000574 |            0 | . deposits wake quickly unusual deposits. express dependencies wake.
  4515876 |  100685 | F           |   510061.6 | 1993-11-02 | 4-NOT SPECIFIED | Clerk#000000105 |            0 | carefully accounts: slyly ironic pinto beans detect. brave somas a
   972901 |  141100 | F           |  508668.52 | 1992-07-18 | 3-MEDIUM        | Clerk#000000667 |            0 | pinto beans according to the blithely express dolphins cajo
(10 rows)

Query 20210208_123059_00031_3ip8n, FINISHED, 2 nodes
Splits: 5 total, 5 done (100.00%)
0.25 [10 rows, 0B] [40 rows/s, 0B/s]

@wendigo
Copy link
Contributor Author

wendigo commented Feb 8, 2021

trino:public> set session postgresql."topn_pushdown_enabled" = false;
SET SESSION
trino:public> SELECT sum(totalprice) FROM orders_sf1 WHERE orderpriority = '1-URGENT' GROUP by custkey ORDER BY sum(totalprice) ASC, custkey DESC LIMIT 10;
  _col0
---------
  965.09
  982.94
 1041.55
 1094.89
 1099.96
 1101.75
 1106.45
 1119.87
 1120.89
 1121.88
(10 rows)

Query 20210208_125938_00033_4b5qn, FINISHED, 2 nodes
Splits: 6 total, 6 done (100.00%)
0.83 [92.3K rows, 0B] [111K rows/s, 0B/s]

trino:public> SELECT sum(totalprice) FROM orders_sf1 WHERE orderpriority = '1-URGENT' GROUP by custkey ORDER BY sum(totalprice) ASC, custkey DESC LIMIT 10;
  _col0
---------
  965.09
  982.94
 1041.55
 1094.89
 1099.96
 1101.75
 1106.45
 1119.87
 1120.89
 1121.88
(10 rows)

Query 20210208_125932_00031_4b5qn, FINISHED, 1 node
Splits: 5 total, 5 done (100.00%)
0.47 [10 rows, 0B] [21 rows/s, 0B/s]

@wendigo wendigo force-pushed the serafin/topn-pushdown branch from 3b84164 to 4f92c24 Compare February 8, 2021 13:01
@wendigo
Copy link
Contributor Author

wendigo commented Feb 8, 2021

Failures are unrelated:

  • error-prone: failed to download maven dependencies
  • trino-kafka: failed to start container

@wendigo wendigo marked this pull request as ready for review February 8, 2021 14:31
@wendigo wendigo requested a review from losipiuk February 8, 2021 14:31
Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some whining about functional programming.

@wendigo wendigo force-pushed the serafin/topn-pushdown branch from 4f92c24 to 70ed668 Compare February 11, 2021 11:08
@wendigo wendigo force-pushed the serafin/topn-pushdown branch 3 times, most recently from e7756c9 to 069943a Compare February 16, 2021 12:06
@wendigo wendigo force-pushed the serafin/topn-pushdown branch from 069943a to 025fb25 Compare February 17, 2021 12:48
@wendigo
Copy link
Contributor Author

wendigo commented Feb 17, 2021

AC @findepi

@wendigo wendigo requested a review from findepi February 17, 2021 12:52

if (handle.getSortOrder().isPresent() || handle.getLimit().isPresent()) {
if (handle.getLimit().equals(OptionalLong.of(topNCount)) && handle.getSortOrder().equals(Optional.of(resultSortOrder))) {
return Optional.of(new TopNApplicationResult<>(handle, jdbcClient.isTopNLimitGuaranteed(session)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return Optional.empty here, per

* <b>Note</b>: it's critical for connectors to return {@link Optional#empty()} if calling this method has no effect for that
* invocation, even if the connector generally supports topN pushdown. Doing otherwise can cause the optimizer
* to loop indefinitely.

returning a TopNApplicationResult will cause the optimize to loop when isTopNLimitGuaranteed = false, but IMO we should return empty regardless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That breaks the code so it doesn't work anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What breaks? i didn't get this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I return Optional.empty() there, assertions isFullyPushedDown will fail as top-level TopN and Exchange nodes will remain in a plan.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to see whether we can defer creation of partial TopN, so that we don't need to deal with partial/final TopN in the TopN pushdown.
Then the engine rule would be narrowed down to SINGLE TopN s only.
Do you feel like you could investigate this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied changes to the optimizers and PushTopNIntoTableScan rule. PTAL now.

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall!
couple editorial comments

@@ -746,7 +746,32 @@ public void testSelectCaseInsensitive()
@Test
public void testTopN()
{
assertQuery("SELECT n.name, r.name FROM nation n LEFT JOIN region r ON n.regionkey = r.regionkey ORDER BY n.name LIMIT 1");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the original short test method.
It doesn't make sense to put the connector under the test if it does not support TopN pushdown (and most do not today).

Ideally, we should remove this test from here and into Engine-only queries (do we have it?)
and have a BCT test that verified that either connector does not support TopN pushdown or is thoroughly tested, like we do with eg inserts.

@wendigo wendigo force-pushed the serafin/topn-pushdown branch from 0fa9634 to 1545f85 Compare February 25, 2021 10:00
@wendigo wendigo requested a review from findepi February 25, 2021 10:04
@wendigo wendigo force-pushed the serafin/topn-pushdown branch 2 times, most recently from d100bc9 to 2e276ac Compare February 25, 2021 14:14
@wendigo
Copy link
Contributor Author

wendigo commented Feb 25, 2021

Ptal @findepi

@wendigo wendigo force-pushed the serafin/topn-pushdown branch from 2e276ac to 36ea6c1 Compare February 26, 2021 12:38
@findepi
Copy link
Member

findepi commented Feb 26, 2021

@wendigo please check the compilation issue

@wendigo wendigo force-pushed the serafin/topn-pushdown branch 2 times, most recently from ed81595 to 7ca452f Compare March 1, 2021 11:41
@wendigo wendigo requested a review from findepi March 1, 2021 11:43
@wendigo wendigo force-pushed the serafin/topn-pushdown branch from 18a8ab5 to 9c5f492 Compare March 2, 2021 09:10
@wendigo
Copy link
Contributor Author

wendigo commented Mar 2, 2021

Failed due to:

2021-03-02T09:57:15.6330682Z [ERROR] Tests run: 398, Failures: 1, Errors: 0, Skipped: 22, Time elapsed: 1,317.346 s <<< FAILURE! - in TestSuite
2021-03-02T09:57:15.6338364Z [ERROR] init(io.trino.plugin.oracle.TestOracleCaseInsensitiveMapping)  Time elapsed: 296.399 s  <<< FAILURE!
2021-03-02T09:57:15.6344290Z java.lang.RuntimeException: java.sql.SQLRecoverableException: IO Error: Connection reset by peer, Authentication lapse 0 ms.

@wendigo wendigo requested a review from findepi March 2, 2021 11:15
@findepi findepi merged commit 1bae748 into trinodb:master Mar 2, 2021
@wendigo wendigo deleted the serafin/topn-pushdown branch March 2, 2021 12:09
@findepi findepi added this to the 353 milestone Mar 2, 2021
@findepi findepi mentioned this pull request Mar 2, 2021
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants