-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(decompile): make the decompiler run on TPCH query 1 #9779
Conversation
This makes parsing TPC-H query 1 break in a different place, so progress?
The `sqlglot` plan optimizer (even with optimizations disabled) replaces a few items with aliases. In other places, it doesn't use fully-qualified column names. Both of these prevent us from reliably converting into an Ibis expression. There are more details in the comments of each helper function, but overall, this adds the table name to qualify the sort columns, replaces the projection aliases with the qualified column names, and replaces any intermediate operands in agg functions with those intermediates.
Ok, this is slightly bonkers, but I'm re-using the values comparison from the regular TPCH tests, then using I was going to |
Ok, this is ready for a look. I think we can improve the testing beyond just DuckDB, but for now this is a definite improvement over the current decompiler and I think there is more we can do to improve it. If we're good with this, I can look into adding more of the TPCH tests |
snapshot.assert_match(code, "out_tpch.py") | ||
|
||
# Import just-created snapshot | ||
SNAPSHOT_MODULE = f"ibis.backends.duckdb.tests.snapshots.test_decompile_tpch.test_parse_sql_tpch.tpch{tpch_query:02d}.out_tpch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yikes 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan on making this better in a follow up, I promise.
@@ -5,4 +5,4 @@ | |||
name="airlines", schema={"dest": "string", "origin": "string", "arrdelay": "int32"} | |||
) | |||
|
|||
result = airlines.filter((airlines.dest.cast("int64") == 0) == True) | |||
result = airlines.filter((((airlines.dest.cast("int64") == 0)) == True)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're a lisp now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have this function marked as @̶̜̍ë̷͔́x̵̺́p̶̠̈́e̶̠͐r̵̹͐i̴̜̎m̶̩̎e̸̛̹ň̶̺ṯ̷̃å̴͉l̷̼͘
😅
I am happy to report that we do |
Context: the sql -> ibis expression parser/creator runs off of a
sqlglot.Plan
objectThe
sqlglot
plan optimizer replaces a few items with aliases. In other places, it doesn't usefully-qualified column names. Both of these prevent us from reliably converting into an Ibis expression.
Description of changes
Dereferencing aggregation operands
The sqlglot planner will pull out computed operands into a separate
section and alias them e.g.
For the purposes of decompiling, we want these to be inline, so here we
replace those new aliases with the parsed sqlglot expression
Dereferencing projections
The sqlglot planner will (sometimes) alias projections to the aggregate
that precedes it.
These aliases are stored in a dictionary in the aggregate
groups
, so ifthose are pulled out beforehand then we can use them to replace the
aliases in the projections.
Dereferencing sort keys
The sqlglot planner doesn't fully qualify sort keys
For now we do a naive thing here and prepend the name of the sortoperation itself, which (maybe?) is the name of the parent table.This is definitely the most brittle part of the existing decompilerI've mucked around with this a little bit, and while it's a touch hacky, using the deferred operator with the not-fully-referenced sort-keys works well.