Optimization rule to merge multiple unions with only constant values #15633
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
== Test plan ==
The changes are tested with unit tests and by adding a couple of tests with "union all queries over more than one sub-query over constants" to AbstractTestQueries to be run as end-to-end query tests.
Manual tests to generate results and plans are also conducted using the presto cli with tpch database against the presto HiveQueryRunner local service. The outputs are as follows:
presto:sf10> EXPLAIN SELECT * FROM ( select 1, 2, 3 union all select * from (values (5,6,6),(10,20,30)));
Query Plan
Estimates: {rows: 3 (45B), cpu: 0.00, memory: 0.00, network: 0.00}
_col0 := expr_19
_col1 := expr_20
_col2 := expr_21
Estimates: {rows: 3 (45B), cpu: 0.00, memory: 0.00, network: 0.00}
(INTEGER 1, INTEGER 2, INTEGER 3)
(INTEGER 5, INTEGER 6, INTEGER 6)
(INTEGER 10, INTEGER 20, INTEGER 30)
(1 row)
presto:sf10> SELECT * FROM ( select 1, 2, 3 union all select * from (values (5,6,6),(10,20,30)));
_col0 | _col1 | _col2
-------+-------+-------
1 | 2 | 3
5 | 6 | 6
10 | 20 | 30
(3 rows)
presto:sf10> EXPLAIN SELECT * FROM ((SELECT custkey, nationkey FROM customer ORDER BY custkey LIMIT 10) UNION ALL (SELECT 1, 2) UNION ALL (SELECT 2, 3));
Query Plan
custkey := expr_28
nationkey := expr_29
Estimates: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}
nationkey := tpch:nationkey
custkey := tpch:custkey
Estimates: {rows: 2 (36B), cpu: 0.00, memory: 0.00, network: 0.00}
(BIGINT 1, BIGINT 2)
(BIGINT 2, BIGINT 3)
(1 row)
presto:sf10> SELECT * FROM ((SELECT custkey, nationkey FROM customer ORDER BY custkey LIMIT 10) UNION ALL (SELECT 1, 2) UNION ALL (SELECT 2, 3));
custkey | nationkey
---------+-----------
1 | 2
2 | 3
1 | 15
2 | 13
3 | 1
4 | 4
5 | 3
6 | 20
7 | 18
8 | 17
9 | 8
10 | 5
(12 rows)
`
== RELEASE NOTES ==
General Changes
This can be enabled/disabled using "optimize_union_over_values" session property.
The property and unit test portion of the PR have been taken from a previous PR Union All optimisation #15097 (Union All optimisation #15097).
Hive Changes