Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks for planning queries #8638

Closed
alamb opened this issue Dec 23, 2023 · 3 comments
Closed

Benchmarks for planning queries #8638

alamb opened this issue Dec 23, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Dec 23, 2023

Is your feature request related to a problem or challenge?

DataFusion has a variety of benchmarks we use for query execution -- that is how long it takes to run a query.

There are no equivalent benchmark suite for how long it takes to plan a query, an area that many people have highlighted as an area of DataFusion they would like to improve. (see #5637 for various ideas)

Recently we have had some PRs such as #7942 and #7870 that propose some non trivial planning change, including some micro benchmarks that show good promise. However, we don't have an agreed upon way to measure the changes overall impacts

Describe the solution you'd like

As suggested by @Dandandan #7942 (comment)

I suggest to also add some benchmarking. We could take for example TCP-H and TCP-DS (which we already have in the benchmarks / tests) and benchmark the time it takes to plan/optimize the queries rather than execute them.

Specifically, I propose adding benchmarks (with documentation about why they are included) in

https://github.com/apache/arrow-datafusion/blob/03c2ef46f2d88fb015ee305ab67df6d930b780e2/datafusion/core/benches/sql_planner.rs

The code would basically do

  1. Create the schema
  2. Plan the relevant query (create the physical plan) but not execute it

Contents:

Describe alternatives you've considered

On alternative could be to update the dfbench tests so they can just plan but not run the queries:

It seems it might not be much work adding an option to the benchmark code to only perform the planning rather than executing the queries.

The dfbench code is here: https://github.com/apache/arrow-datafusion/blob/main/benchmarks/src/bin/dfbench.rs

Additional context

No response

@alamb alamb added the enhancement New feature or request label Dec 23, 2023
@alamb
Copy link
Contributor Author

alamb commented Dec 23, 2023

I recommend picking one of these test suites (perhaps TPCH or ClickBench) and figuring out the pattern for a benchmark test, and then working on the others

@matthewmturner
Copy link
Contributor

I will work on this

@alamb
Copy link
Contributor Author

alamb commented Apr 11, 2024

FWIW I think we have a good set of benchmarks now (sql_planner benchmark) so let's close this for now

@alamb alamb closed this as completed Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants