Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization rule to merge multiple unions with only constant values #15633

Closed
wants to merge 2 commits into from

Conversation

charygo
Copy link

@charygo charygo commented Jan 24, 2021

== Test plan ==

The changes are tested with unit tests and by adding a couple of tests with "union all queries over more than one sub-query over constants" to AbstractTestQueries to be run as end-to-end query tests.

Manual tests to generate results and plans are also conducted using the presto cli with tpch database against the presto HiveQueryRunner local service. The outputs are as follows:

presto:sf10> EXPLAIN SELECT * FROM ( select 1, 2, 3 union all select * from (values (5,6,6),(10,20,30)));
Query Plan

  • Output[_col0, _col1, _col2] => [expr_19:integer, expr_20:integer, expr_21:integer]
    Estimates: {rows: 3 (45B), cpu: 0.00, memory: 0.00, network: 0.00}
    _col0 := expr_19
    _col1 := expr_20
    _col2 := expr_21
    • Values => [expr_19:integer, expr_20:integer, expr_21:integer]
      Estimates: {rows: 3 (45B), cpu: 0.00, memory: 0.00, network: 0.00}
      (INTEGER 1, INTEGER 2, INTEGER 3)
      (INTEGER 5, INTEGER 6, INTEGER 6)
      (INTEGER 10, INTEGER 20, INTEGER 30)

(1 row)

presto:sf10> SELECT * FROM ( select 1, 2, 3 union all select * from (values (5,6,6),(10,20,30)));
_col0 | _col1 | _col2
-------+-------+-------
1 | 2 | 3
5 | 6 | 6
10 | 20 | 30
(3 rows)

presto:sf10> EXPLAIN SELECT * FROM ((SELECT custkey, nationkey FROM customer ORDER BY custkey LIMIT 10) UNION ALL (SELECT 1, 2) UNION ALL (SELECT 2, 3));
Query Plan

  • Output[custkey, nationkey] => [expr_28:bigint, expr_29:bigint]
    custkey := expr_28
    nationkey := expr_29
    • LocalExchange[ROUND_ROBIN] () => [expr_28:bigint, expr_29:bigint]
      • TopN[10 by (custkey ASC_NULLS_LAST)] => [custkey:bigint, nationkey:bigint]
        • LocalExchange[SINGLE] () => [custkey:bigint, nationkey:bigint]
          • RemoteStreamingExchange[GATHER] => [custkey:bigint, nationkey:bigint]
            • TopNPartial[10 by (custkey ASC_NULLS_LAST)] => [custkey:bigint, nationkey:bigint]
              • TableScan[TableHandle {connectorId='tpch', connectorHandle='customer:sf10.0', layout='Optional[customer:sf10.0]'}] => [custkey:bigint, nationkey:bigint]
                Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}
                nationkey := tpch:nationkey
                custkey := tpch:custkey
      • Values => [union_val_expr:bigint, union_val_expr_42:bigint]
        Estimates: {rows: 2 (36B), cpu: 0.00, memory: 0.00, network: 0.00}
        (BIGINT 1, BIGINT 2)
        (BIGINT 2, BIGINT 3)

(1 row)

presto:sf10> SELECT * FROM ((SELECT custkey, nationkey FROM customer ORDER BY custkey LIMIT 10) UNION ALL (SELECT 1, 2) UNION ALL (SELECT 2, 3));
custkey | nationkey
---------+-----------
1 | 2
2 | 3
1 | 15
2 | 13
3 | 1
4 | 4
5 | 3
6 | 20
7 | 18
8 | 17
9 | 8
10 | 5
(12 rows)
`

== RELEASE NOTES ==

General Changes

  • Improve/optimize performance for queries that union all over more than one sub-query over constants (values query node).
    This can be enabled/disabled using "optimize_union_over_values" session property.
    The property and unit test portion of the PR have been taken from a previous PR Union All optimisation #15097 (Union All optimisation #15097).

Hive Changes

  • NONE

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 24, 2021

CLA Signed

The committers are authorized under a signed CLA.

  • ✅ charygo (9d4f2ba79fb722479e697591c3237c65d8863d32)

@charygo charygo force-pushed the charyg/optimizeunions branch from 1da882c to 91322c4 Compare January 26, 2021 03:31
@stale
Copy link

stale bot commented Aug 3, 2021

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the task, make sure you've addressed reviewer comments, and rebase on the latest master. Thank you for your contributions!

@stale stale bot added the stale label Aug 3, 2021
@stale stale bot closed this Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant