-
Notifications
You must be signed in to change notification settings - Fork 610
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(decompile): make the decompiler run on TPCH query 1 (#9779)
Context: the sql -> ibis expression parser/creator runs off of a `sqlglot.Plan` object The `sqlglot` plan optimizer replaces a few items with aliases. In other places, it doesn't use fully-qualified column names. Both of these prevent us from reliably converting into an Ibis expression. ## Description of changes ### Dereferencing aggregation operands The sqlglot planner will pull out computed operands into a separate section and alias them e.g. ``` Context: Aggregations: - SUM("_a_0") AS "sum_disc_price" Operands: - "lineitem"."l_extendedprice" * (1 - "lineitem"."l_discount") AS _a_0 ``` For the purposes of decompiling, we want these to be inline, so here we replace those new aliases with the parsed sqlglot expression ### Dereferencing projections The sqlglot planner will (sometimes) alias projections to the aggregate that precedes it. ``` - Sort: lineitem (132849388268768) Context: Key: - "l_returnflag" - "l_linestatus" Projections: - lineitem._g0 AS "l_returnflag" - lineitem._g1 AS "l_linestatus" <snip> Dependencies: - Aggregate: lineitem (132849388268864) Context: Aggregations: <snip> Group: - "lineitem"."l_returnflag" <-- this is _g0 - "lineitem"."l_linestatus" <-- this is _g1 <snip> ``` These aliases are stored in a dictionary in the aggregate `groups`, so if those are pulled out beforehand then we can use them to replace the aliases in the projections. ### Dereferencing sort keys The sqlglot planner doesn't fully qualify sort keys ``` - Sort: lineitem (132849388268768) Context: Key: - "l_returnflag" - "l_linestatus" ``` ~For now we do a naive thing here and prepend the name of the sort~ ~operation itself, which (maybe?) is the name of the parent table.~ ~*This is definitely the most brittle part of the existing decompiler*~ I've mucked around with this a little bit, and while it's a _touch_ hacky, using the deferred operator with the not-fully-referenced sort-keys works well.
- Loading branch information
Showing
49 changed files
with
494 additions
and
102 deletions.
There are no files selected for viewing
44 changes: 44 additions & 0 deletions
44
...ackends/duckdb/tests/snapshots/test_decompile_tpch/test_parse_sql_tpch/tpch01/out_tpch.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
import ibis | ||
|
||
|
||
lineitem = ibis.table( | ||
name="lineitem", | ||
schema={ | ||
"l_orderkey": "int32", | ||
"l_partkey": "int32", | ||
"l_suppkey": "int32", | ||
"l_linenumber": "int32", | ||
"l_quantity": "decimal(15, 2)", | ||
"l_extendedprice": "decimal(15, 2)", | ||
"l_discount": "decimal(15, 2)", | ||
"l_tax": "decimal(15, 2)", | ||
"l_returnflag": "string", | ||
"l_linestatus": "string", | ||
"l_shipdate": "date", | ||
"l_commitdate": "date", | ||
"l_receiptdate": "date", | ||
"l_shipinstruct": "string", | ||
"l_shipmode": "string", | ||
"l_comment": "string", | ||
}, | ||
) | ||
lit = ibis.literal(1) | ||
f = lineitem.filter((lineitem.l_shipdate <= ibis.literal("1998-09-02").cast("date"))) | ||
multiply = f.l_extendedprice * ((lit - f.l_discount)) | ||
agg = f.aggregate( | ||
[ | ||
f.l_quantity.sum().name("sum_qty"), | ||
f.l_extendedprice.sum().name("sum_base_price"), | ||
multiply.sum().name("sum_disc_price"), | ||
((multiply) * ((lit + f.l_tax))).sum().name("sum_charge"), | ||
f.l_quantity.mean().name("avg_qty"), | ||
f.l_extendedprice.mean().name("avg_price"), | ||
f.l_discount.mean().name("avg_disc"), | ||
f.count().name("count_order"), | ||
], | ||
by=[f.l_returnflag, f.l_linestatus], | ||
) | ||
|
||
result = agg.order_by( | ||
agg.l_returnflag.asc(nulls_first=True), agg.l_linestatus.asc(nulls_first=True) | ||
) |
106 changes: 106 additions & 0 deletions
106
...ackends/duckdb/tests/snapshots/test_decompile_tpch/test_parse_sql_tpch/tpch03/out_tpch.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
import ibis | ||
|
||
|
||
customer = ibis.table( | ||
name="customer", | ||
schema={ | ||
"c_custkey": "int64", | ||
"c_name": "string", | ||
"c_address": "string", | ||
"c_nationkey": "int16", | ||
"c_phone": "string", | ||
"c_acctbal": "decimal", | ||
"c_mktsegment": "string", | ||
"c_comment": "string", | ||
}, | ||
) | ||
lit = ibis.literal(True) | ||
orders = ibis.table( | ||
name="orders", | ||
schema={ | ||
"o_orderkey": "int64", | ||
"o_custkey": "int64", | ||
"o_orderstatus": "string", | ||
"o_totalprice": "decimal(12, 2)", | ||
"o_orderdate": "date", | ||
"o_orderpriority": "string", | ||
"o_clerk": "string", | ||
"o_shippriority": "int32", | ||
"o_comment": "string", | ||
}, | ||
) | ||
lineitem = ibis.table( | ||
name="lineitem", | ||
schema={ | ||
"l_orderkey": "int32", | ||
"l_partkey": "int32", | ||
"l_suppkey": "int32", | ||
"l_linenumber": "int32", | ||
"l_quantity": "decimal(15, 2)", | ||
"l_extendedprice": "decimal(15, 2)", | ||
"l_discount": "decimal(15, 2)", | ||
"l_tax": "decimal(15, 2)", | ||
"l_returnflag": "string", | ||
"l_linestatus": "string", | ||
"l_shipdate": "date", | ||
"l_commitdate": "date", | ||
"l_receiptdate": "date", | ||
"l_shipinstruct": "string", | ||
"l_shipmode": "string", | ||
"l_comment": "string", | ||
}, | ||
) | ||
cast = ibis.literal("1995-03-15").cast("date") | ||
joinchain = ( | ||
customer.inner_join( | ||
orders, | ||
[(customer.c_custkey == orders.o_custkey), lit, (orders.o_orderdate < cast)], | ||
) | ||
.inner_join( | ||
lineitem, | ||
[(orders.o_orderkey == lineitem.l_orderkey), lit, (lineitem.l_shipdate > cast)], | ||
) | ||
.select( | ||
customer.c_custkey, | ||
customer.c_name, | ||
customer.c_address, | ||
customer.c_nationkey, | ||
customer.c_phone, | ||
customer.c_acctbal, | ||
customer.c_mktsegment, | ||
customer.c_comment, | ||
orders.o_orderkey, | ||
orders.o_custkey, | ||
orders.o_orderstatus, | ||
orders.o_totalprice, | ||
orders.o_orderdate, | ||
orders.o_orderpriority, | ||
orders.o_clerk, | ||
orders.o_shippriority, | ||
orders.o_comment, | ||
lineitem.l_orderkey, | ||
lineitem.l_partkey, | ||
lineitem.l_suppkey, | ||
lineitem.l_linenumber, | ||
lineitem.l_quantity, | ||
lineitem.l_extendedprice, | ||
lineitem.l_discount, | ||
lineitem.l_tax, | ||
lineitem.l_returnflag, | ||
lineitem.l_linestatus, | ||
lineitem.l_shipdate, | ||
lineitem.l_commitdate, | ||
lineitem.l_receiptdate, | ||
lineitem.l_shipinstruct, | ||
lineitem.l_shipmode, | ||
lineitem.l_comment, | ||
) | ||
) | ||
f = joinchain.filter((joinchain.c_mktsegment == "BUILDING")) | ||
agg = f.aggregate( | ||
[(f.l_extendedprice * ((1 - f.l_discount))).sum().name("revenue")], | ||
by=[f.l_orderkey, f.o_orderdate, f.o_shippriority], | ||
) | ||
s = agg.order_by(agg.revenue.desc(), agg.o_orderdate.asc(nulls_first=True)) | ||
|
||
result = s.select(s.l_orderkey, s.revenue, s.o_orderdate, s.o_shippriority).limit(10) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
from __future__ import annotations | ||
|
||
import importlib | ||
from contextlib import contextmanager | ||
from pathlib import Path | ||
|
||
import pytest | ||
from pytest import param | ||
|
||
import ibis | ||
from ibis.backends.tests.tpc.conftest import compare_tpc_results | ||
from ibis.formats.pandas import PandasData | ||
|
||
tpch_catalog = { | ||
"lineitem": { | ||
"l_orderkey": "int32", | ||
"l_partkey": "int32", | ||
"l_suppkey": "int32", | ||
"l_linenumber": "int32", | ||
"l_quantity": "decimal(15, 2)", | ||
"l_extendedprice": "decimal(15, 2)", | ||
"l_discount": "decimal(15, 2)", | ||
"l_tax": "decimal(15, 2)", | ||
"l_returnflag": "string", | ||
"l_linestatus": "string", | ||
"l_shipdate": "date", | ||
"l_commitdate": "date", | ||
"l_receiptdate": "date", | ||
"l_shipinstruct": "string", | ||
"l_shipmode": "string", | ||
"l_comment": "string", | ||
}, | ||
"customer": [ | ||
("c_custkey", "int64"), | ||
("c_name", "string"), | ||
("c_address", "string"), | ||
("c_nationkey", "int16"), | ||
("c_phone", "string"), | ||
("c_acctbal", "decimal"), | ||
("c_mktsegment", "string"), | ||
("c_comment", "string"), | ||
], | ||
"orders": [ | ||
("o_orderkey", "int64"), | ||
("o_custkey", "int64"), | ||
("o_orderstatus", "string"), | ||
("o_totalprice", "decimal(12,2)"), | ||
("o_orderdate", "date"), | ||
("o_orderpriority", "string"), | ||
("o_clerk", "string"), | ||
("o_shippriority", "int32"), | ||
("o_comment", "string"), | ||
], | ||
} | ||
|
||
root = Path(__file__).absolute().parents[3] | ||
|
||
SQL_QUERY_PATH = root / "backends" / "tests" / "tpc" / "queries" / "duckdb" / "h" | ||
|
||
|
||
@contextmanager | ||
def set_database(con, db): | ||
olddb = con.current_database | ||
con.raw_sql(f"USE {db}") | ||
yield | ||
con.raw_sql(f"USE {olddb}") | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"tpch_query", | ||
[ | ||
param(1, id="tpch01"), | ||
param(3, id="tpch03"), | ||
], | ||
) | ||
def test_parse_sql_tpch(tpch_query, snapshot, con, data_dir): | ||
tpch_query_file = SQL_QUERY_PATH / f"{tpch_query:02d}.sql" | ||
with open(tpch_query_file) as f: | ||
sql = f.read() | ||
|
||
expr = ibis.parse_sql(sql, tpch_catalog) | ||
code = ibis.decompile(expr, format=True) | ||
snapshot.assert_match(code, "out_tpch.py") | ||
|
||
# Import just-created snapshot | ||
SNAPSHOT_MODULE = f"ibis.backends.duckdb.tests.snapshots.test_decompile_tpch.test_parse_sql_tpch.tpch{tpch_query:02d}.out_tpch" | ||
module = importlib.import_module(SNAPSHOT_MODULE) | ||
|
||
with set_database(con, "tpch"): | ||
# Get results from executing SQL directly on DuckDB | ||
expected_df = con.con.execute(sql).df() | ||
# Get results from decompiled ibis query | ||
result_df = con.to_pandas(module.result) | ||
|
||
# Then set the expected columns so we can coerce the datatypes | ||
# of the pandas dataframe correctly | ||
expected_df.columns = result_df.columns | ||
|
||
expected_df = PandasData.convert_table(expected_df, module.result.schema()) | ||
|
||
compare_tpc_results(result_df, expected_df) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.