Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain analyze doesn't (fully) optimize queries #917

Closed
Dandandan opened this issue Aug 21, 2021 · 2 comments · Fixed by #929
Closed

Explain analyze doesn't (fully) optimize queries #917

Dandandan opened this issue Aug 21, 2021 · 2 comments · Fixed by #929
Assignees
Labels
bug Something isn't working

Comments

@Dandandan
Copy link
Contributor

Dandandan commented Aug 21, 2021

Describe the bug

A simple query like select max(l_partkey) from lineitem has a different plan when using explain analyze than what is executed when trying a normal select.

The output from explain analyze is slower than normal and hasn't the statistics-based optimization enabled:

+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                                                                                                                             |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[MAX(lineitem.l_partkey)@0 as MAX(l_partkey)], metrics=[]                                                                                                                                                                                                                                   |
|                   |   HashAggregateExec: mode=Final, gby=[], aggr=[MAX(l_partkey)], metrics=[outputRows=0]                                                                                                                                                                                                                           |
|                   |     CoalescePartitionsExec, metrics=[]                                                                                                                                                                                                                                                                           |
|                   |       HashAggregateExec: mode=Partial, gby=[], aggr=[MAX(l_partkey)], metrics=[outputRows=0]                                                                                                                                                                                                                     |
|                   |         RepartitionExec: partitioning=RoundRobinBatch(16), metrics=[fetchTime=51231297, repartitionTime=0, sendTime=120310]                                                                                                                                                                                      |
|                   |           ParquetExec: batch_size=8192, limit=None, partitions=[../benchmarks/parquet/lineitem/part-0.parquet], metrics=[numPredicateCreationErrors=0, numPredicateEvaluationErrors for ../benchmarks/parquet/lineitem/part-0.parquet=0, numRowGroupsPruned for ../benchmarks/parquet/lineitem/part-0.parquet=0] |
+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.071 seconds.
> explain select max(l_partkey) from lineitem;
+---------------+--------------------------------------------------------------------+
| plan_type     | plan                                                               |
+---------------+--------------------------------------------------------------------+
| logical_plan  | Projection: #MAX(lineitem.l_partkey)                               |
|               |   Projection: Int32(200000) AS MAX(l_partkey)                      |
|               |     EmptyRelation                                                  |
| physical_plan | ProjectionExec: expr=[MAX(lineitem.l_partkey)@0 as MAX(l_partkey)] |
|               |   RepartitionExec: partitioning=RoundRobinBatch(16)                |
|               |     ProjectionExec: expr=[200000 as MAX(l_partkey)]                |
|               |       EmptyExec: produce_one_row=true                              |
+---------------+--------------------------------------------------------------------+
2 rows in set. Query took 0.002 seconds.

To Reproduce
run some queries with explain vs explain analyze.

Expected behavior
explain analyze should have same plans

Additional context

@Dandandan Dandandan added the bug Something isn't working label Aug 21, 2021
@Dandandan
Copy link
Contributor Author

FYI @alamb

@alamb
Copy link
Contributor

alamb commented Aug 23, 2021

I will take this one. Thanks @Dandandan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants