From 9c2f362c92d3d2e57b61d2122b51862f85a140f9 Mon Sep 17 00:00:00 2001 From: Cody Date: Sat, 2 Mar 2024 17:38:57 -0500 Subject: [PATCH 1/2] docs: add Python + SQL section to why ibis --- docs/why.qmd | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/docs/why.qmd b/docs/why.qmd index 0bfd8d75a516..5e6f6889e950 100644 --- a/docs/why.qmd +++ b/docs/why.qmd @@ -228,6 +228,72 @@ and robust framework for data manipulation in Python. In the long-term, we aim for a standard query plan Intermediate Representation (IR) like [Substrait](https://substrait.io) to simplify this further. +## Python + SQL: better together + +For most backends, Ibis works by compiling Python expressions into SQL: + +```{python} +g = t.group_by(["species", "island"]).agg(count=t.count()).order_by("count") +ibis.to_sql(g) +``` + +You can mix and match Python and SQL code: + +```{python} +sql = """ +SELECT + species, + island, + COUNT(*) AS count +FROM penguins +GROUP BY species, island +""".strip() +``` + + +::: {.panel-tabset} + +## DuckDB + +```{python} +con = ibis.duckdb.connect() +t = con.read_parquet("penguins.parquet") +g = t.alias("penguins").sql(sql) +g +``` + +```{python} +g.order_by("count") +``` + +## DataFusion + +```{python} +con = ibis.datafusion.connect() +t = con.read_parquet("penguins.parquet") +g = t.alias("penguins").sql(sql) +g +``` + +```{python} +g.order_by("count") +``` + +## PySpark + +```{python} +con = ibis.connect("pyspark://") +t = con.read_parquet("penguins.parquet") +g = t.alias("penguins").sql(sql) +g +``` + +```{python} +g.order_by("count") +``` + +::: + ## Scaling up and out Out of the box, Ibis offers a great local experience for working with many file From 17998b49105f356e2faf3136d3c31caf8df6edb6 Mon Sep 17 00:00:00 2001 From: Cody Date: Sat, 2 Mar 2024 17:42:25 -0500 Subject: [PATCH 2/2] say the line --- docs/why.qmd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/why.qmd b/docs/why.qmd index 5e6f6889e950..c3410816f991 100644 --- a/docs/why.qmd +++ b/docs/why.qmd @@ -250,7 +250,6 @@ GROUP BY species, island """.strip() ``` - ::: {.panel-tabset} ## DuckDB @@ -294,6 +293,9 @@ g.order_by("count") ::: +This allows you to combine the flexibility of Python with the scale and +performance of modern SQL. + ## Scaling up and out Out of the box, Ibis offers a great local experience for working with many file