-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(9870): common expression elimination optimization, should always re-find the correct expression during re-write. #9871
fix(9870): common expression elimination optimization, should always re-find the correct expression during re-write. #9871
Conversation
…mmon-expr-elimination traversals
… not always stay in sync with the updated TreeNode traversal
…, while keeping the (stack-popped) symbol used for alias.
101eb18
to
7a658b6
Compare
… the expr_identifier from the alias symbol
7a658b6
to
ff7b3d6
Compare
Outstanding question:There may be some changes in the performance. While consuming the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @wiedld -- I reviewed the code carefully and it looks quite good to me. Not only does it fix the bug I think it makes the structure easier to reason about too. 🏆
I also ran the planning benchmarks which showed no change due to this PR
Benchmark Details
cargo bench --bench sql_planner
alamb@aal-dev:~/arrow-datafusion4$ critcmp main this_branch
group main this_branch
----- ---- -----------
logical_aggregate_with_join 1.00 1268.1±34.12µs ? ?/sec 1.01 1275.3±74.80µs ? ?/sec
logical_plan_tpch_all 1.00 17.1±0.20ms ? ?/sec 1.02 17.5±0.37ms ? ?/sec
logical_select_all_from_1000 1.01 94.7±0.43ms ? ?/sec 1.00 94.0±0.57ms ? ?/sec
logical_select_one_from_700 1.00 738.4±28.13µs ? ?/sec 1.01 743.6±11.76µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.00 792.5±14.03µs ? ?/sec 1.00 788.7±9.50µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.00 756.6±9.69µs ? ?/sec 1.00 759.3±29.50µs ? ?/sec
physical_plan_tpch_all 1.00 133.6±0.79ms ? ?/sec 1.01 135.0±1.28ms ? ?/sec
physical_plan_tpch_q1 1.00 7.6±0.04ms ? ?/sec 1.02 7.7±0.04ms ? ?/sec
physical_plan_tpch_q10 1.00 6.3±0.05ms ? ?/sec 1.01 6.4±0.04ms ? ?/sec
physical_plan_tpch_q11 1.00 5.0±0.04ms ? ?/sec 1.03 5.1±0.09ms ? ?/sec
physical_plan_tpch_q12 1.00 4.1±0.03ms ? ?/sec 1.01 4.1±0.03ms ? ?/sec
physical_plan_tpch_q13 1.00 2.7±0.03ms ? ?/sec 1.01 2.7±0.02ms ? ?/sec
physical_plan_tpch_q14 1.00 3.4±0.02ms ? ?/sec 1.03 3.6±0.03ms ? ?/sec
physical_plan_tpch_q16 1.00 5.3±0.04ms ? ?/sec 1.01 5.3±0.04ms ? ?/sec
physical_plan_tpch_q17 1.00 4.9±0.07ms ? ?/sec 1.00 4.9±0.04ms ? ?/sec
physical_plan_tpch_q18 1.00 5.4±0.03ms ? ?/sec 1.01 5.5±0.06ms ? ?/sec
physical_plan_tpch_q19 1.00 10.1±0.08ms ? ?/sec 1.01 10.2±0.06ms ? ?/sec
physical_plan_tpch_q2 1.00 12.2±0.08ms ? ?/sec 1.01 12.3±0.09ms ? ?/sec
physical_plan_tpch_q20 1.00 6.4±0.06ms ? ?/sec 1.01 6.5±0.06ms ? ?/sec
physical_plan_tpch_q21 1.00 9.5±0.14ms ? ?/sec 1.01 9.6±0.10ms ? ?/sec
physical_plan_tpch_q22 1.00 4.7±0.05ms ? ?/sec 1.01 4.7±0.04ms ? ?/sec
physical_plan_tpch_q3 1.00 4.1±0.02ms ? ?/sec 1.02 4.2±0.03ms ? ?/sec
physical_plan_tpch_q4 1.00 3.4±0.03ms ? ?/sec 1.00 3.4±0.02ms ? ?/sec
physical_plan_tpch_q5 1.00 6.1±0.04ms ? ?/sec 1.01 6.1±0.04ms ? ?/sec
physical_plan_tpch_q6 1.00 2.0±0.02ms ? ?/sec 1.02 2.1±0.02ms ? ?/sec
physical_plan_tpch_q7 1.00 8.6±0.06ms ? ?/sec 1.01 8.7±0.07ms ? ?/sec
physical_plan_tpch_q8 1.00 12.1±0.08ms ? ?/sec 1.02 12.3±0.23ms ? ?/sec
physical_plan_tpch_q9 1.00 9.2±0.06ms ? ?/sec 1.01 9.2±0.06ms ? ?/sec
physical_select_all_from_1000 1.01 691.9±1.14ms ? ?/sec 1.00 687.8±1.48ms ? ?/sec
physical_select_one_from_700 1.00 4.1±0.02ms ? ?/sec 1.01 4.2±0.05ms ? ?/sec
cc @waynexia @peter-toth in case you would like to review
/// - counter | ||
/// - DataType of this expression. | ||
/// - symbol used as the identifier in the alias. | ||
map: HashMap<Identifier, (Expr, usize, DataType, Identifier)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amount of copying this requires is unfortunate (maybe it is N^2 in the number of exprs?) -- however, I don't think your changes make it any worse (or better).
I filed #9873 to track the issue. Maybe now that you know this code much better, a fun side project would be to make it much faster 🚀 (🎣 )
Given this is a regression I will plan to merge this PR tomorrow unless someone else would like time to review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wiedld and @alamb. I love the structure of this commit history and the code!
For the performance part, I learned the overall plan optimization phase is costly, but haven't looked into the per rule detail. The planner benchmark is very helpful 👍
For this specific rule, it needs many traverse passes and many stringify in the base. I agree with @alamb that this fix doesn't make worse. But find some way to alleviate the overhead is very meaningful, especially when we extend this rule to run across plans. And the way to identify an expr might also need to change later considering #9776.
By the way, this might be in conflict with #9719.
Yes, I agree. It is key to our ability to make performance improvements in planning
💯 to this as well. I think avoiding creating String identifiers will be the key. |
Thank you for the review @waynexia |
…re-find the correct expression during re-write. (#9871) * test(9870): reproducer of error with jumping traversal patterns in common-expr-elimination traversals * refactor: remove the IdArray ordered idx, since the idx ordering does not always stay in sync with the updated TreeNode traversal * refactor: use the only reproducible key (expr_identifer) for expr_set, while keeping the (stack-popped) symbol used for alias. * refactor: encapsulate most of the logic within ExprSet, and delineate the expr_identifier from the alias symbol * test(9870): demonstrate that the sqllogictests are now passing
I think this one may need some change in SqlParser, currently in Sqlparser we just parse syntax like "struct(xx, xx, xx, xx)" to be parsed to Expr::Struct, but when encounter some structure like "t2.struct(xx, xx, xx, xx)" the Sqlparser would parse them into two ident. I think right now it is not ready yet. |
I think the issue is tracked in #9891 |
…re-find the correct expression during re-write. (apache#9871) * test(9870): reproducer of error with jumping traversal patterns in common-expr-elimination traversals * refactor: remove the IdArray ordered idx, since the idx ordering does not always stay in sync with the updated TreeNode traversal * refactor: use the only reproducible key (expr_identifer) for expr_set, while keeping the (stack-popped) symbol used for alias. * refactor: encapsulate most of the logic within ExprSet, and delineate the expr_identifier from the alias symbol * test(9870): demonstrate that the sqllogictests are now passing
…re-find the correct expression during re-write. (apache#9871) * test(9870): reproducer of error with jumping traversal patterns in common-expr-elimination traversals * refactor: remove the IdArray ordered idx, since the idx ordering does not always stay in sync with the updated TreeNode traversal * refactor: use the only reproducible key (expr_identifer) for expr_set, while keeping the (stack-popped) symbol used for alias. * refactor: encapsulate most of the logic within ExprSet, and delineate the expr_identifier from the alias symbol * test(9870): demonstrate that the sqllogictests are now passing
…re-find the correct expression during re-write. (apache#9871) * test(9870): reproducer of error with jumping traversal patterns in common-expr-elimination traversals * refactor: remove the IdArray ordered idx, since the idx ordering does not always stay in sync with the updated TreeNode traversal * refactor: use the only reproducible key (expr_identifer) for expr_set, while keeping the (stack-popped) symbol used for alias. * refactor: encapsulate most of the logic within ExprSet, and delineate the expr_identifier from the alias symbol * test(9870): demonstrate that the sqllogictests are now passing
…re-find the correct expression during re-write. (apache#9871) * test(9870): reproducer of error with jumping traversal patterns in common-expr-elimination traversals * refactor: remove the IdArray ordered idx, since the idx ordering does not always stay in sync with the updated TreeNode traversal * refactor: use the only reproducible key (expr_identifer) for expr_set, while keeping the (stack-popped) symbol used for alias. * refactor: encapsulate most of the logic within ExprSet, and delineate the expr_identifier from the alias symbol * test(9870): demonstrate that the sqllogictests are now passing
… always re-find the correct expression during re-write. (apache#9871)" This reverts commit cd7a00b.
… always re-find the correct expression during re-write. (apache#9871)" This reverts commit cd7a00b.
… always re-find the correct expression during re-write. (apache#9871)" This reverts commit cd7a00b.
… always re-find the correct expression during re-write. (apache#9871)" This reverts commit cd7a00b.
* Revert "fix(9870): common expression elimination optimization, should always re-find the correct expression during re-write. (#9871)" This reverts commit cd7a00b. * expr id should always contain the full expr structure, cleaner expr ids, better JumpMark handling, better variable names, code cleanup, some new todos * move `Expr` from `expr_set`s to `affected_id`s * better naming, docs fixes * introduce `CommonExprs` type alias, minor todo fix * add test --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Revert "fix(9870): common expression elimination optimization, should always re-find the correct expression during re-write. (apache#9871)" This reverts commit cd7a00b. * expr id should always contain the full expr structure, cleaner expr ids, better JumpMark handling, better variable names, code cleanup, some new todos * move `Expr` from `expr_set`s to `affected_id`s * better naming, docs fixes * introduce `CommonExprs` type alias, minor todo fix * add test --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Closes #9870
Which issue does this PR close?
We found an example where the common-expr-elimination was mutating the logical plan incorrectly. Specifically, it would replace a logical plan node with an incorrect node => leading to a failure in the type check. We have a reproducer of this error (see the first commit's expected test errors).
Rationale for this change
We isolated this error to how the IdArray is generated and used. The IdArray is generated by insertion-at-index during traversal of a first visitor. This IdArray is then read during the second visitor, also by an index, but this index is incremented differently as this second visitor traverses.
As such, the second visitor finds the wrong expr symbol, and then inserts the wrong expression. We could make small changes to fix our bug, but then broke other statements. We feel that the index-based lookup may be inherently fragile to slight changes in traversal patterns across the two visitors; as such, this PR is an alternative approach.
What changes are included in this PR?
1st commit == reproducer, with the expected test errors
2-4th commits == our steps as we introduced the refactoring:
expr_identifier()
(a.k.a. a stringified expr) which is constant across the two separate tree traversals.expr_identifier()
) into the ExprSet5th commit == update the reproducer tests, which are now passing.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.