Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support table.sql() for Polars #8525

Closed
1 task done
lostmygithubaccount opened this issue Mar 2, 2024 · 18 comments · Fixed by #8528
Closed
1 task done

feat: support table.sql() for Polars #8525

lostmygithubaccount opened this issue Mar 2, 2024 · 18 comments · Fixed by #8528
Labels
feature Features or general enhancements polars The polars backend

Comments

@lostmygithubaccount
Copy link
Member

lostmygithubaccount commented Mar 2, 2024

Is your feature request related to a problem?

con.sql() is implemented, but table.sql() is not:

[ins] In [1]: import ibis

[ins] In [2]: ibis.options.interactive = True

[ins] In [3]: ibis.set_backend("polars")

[ins] In [4]: t = ibis.examples.penguins.fetch()

[ins] In [5]: t
Out[5]:
┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ speciesislandbill_length_mmbill_depth_mmflipper_length_mmbody_mass_gsexyear  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ stringstringfloat64float64int64int64stringint64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ AdelieTorgersen39.118.71813750male2007 │
│ AdelieTorgersen39.517.41863800female2007 │
│ AdelieTorgersen40.318.01953250female2007 │
│ AdelieTorgersenNULLNULLNULLNULLNULL2007 │
│ AdelieTorgersen36.719.31933450female2007 │
│ AdelieTorgersen39.320.61903650male2007 │
│ AdelieTorgersen38.917.81813625female2007 │
│ AdelieTorgersen39.219.61954675male2007 │
│ AdelieTorgersen34.118.11933475NULL2007 │
│ AdelieTorgersen42.020.21904250NULL2007 │
│ …       │ …         │              … │             … │                 … │           … │ …      │     … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

[ins] In [6]: t.sql("SELECT species, island, count(*) AS count FROM penguins GROUP BY species, island")
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[6], line 1
----> 1 t.sql("SELECT species, island, count(*) AS count FROM penguins GROUP BY species, island")

File ~/repos/ibis/ibis/expr/types/relations.py:3329, in Table.sql(self, query, dialect)
   3326     name = util.gen_name("sql_query")
   3327     expr = self
-> 3329 schema = backend._get_sql_string_view_schema(name, expr, query)
   3330 node = ops.SQLStringView(child=self.op(), query=query, schema=schema)
   3331 return node.to_expr()

File ~/repos/ibis/ibis/backends/polars/__init__.py:410, in Backend._get_sql_string_view_schema(self, name, table, query)
    409 def _get_sql_string_view_schema(self, name, table, query) -> sch.Schema:
--> 410     raise NotImplementedError("table.sql() not yet supported in polars")

NotImplementedError: table.sql() not yet supported in polars

Describe the solution you'd like

support table.sql() with the Polars backend

this is mainly useful to me for consistency in example code for the documentation

What version of ibis are you running?

main

What backend(s) are you using, if any?

Polars

Code of Conduct

  • I agree to follow this project's Code of Conduct
@cpcloud
Copy link
Member

cpcloud commented Mar 3, 2024

Thanks for the issue.

Can you paste in a copy-pastable example 😁

@cpcloud
Copy link
Member

cpcloud commented Mar 3, 2024

We can't support anything but the most trivial queries until Polars has support for CTEs (common table expressions). I'll put up a draft PR to demonstrate the issue.

@lostmygithubaccount
Copy link
Member Author

updated -- the PR works for the example I want to use for docs

@lostmygithubaccount
Copy link
Member Author

looks like two relevant (currently open) issues in Polars:

@cpcloud
Copy link
Member

cpcloud commented Mar 6, 2024

The WITH box is checked, does that mean it's supposed to be done?

@cpcloud
Copy link
Member

cpcloud commented Mar 6, 2024

It does seem to work in 0.20.13:

In [9]: import polars as pl

In [10]: pl.__version__
Out[10]: '0.20.13'

In [11]: df = pl.DataFrame({"a": [1, 2, 3]})

In [12]: ctx = pl.SQLContext()

In [13]: ctx.register("t", df)
Out[13]: <SQLContext [tables:1] at 0x7f71c2762a90>

In [14]: ctx.execute("with t1 as (select * from t) select * from t1").collect()
Out[14]:
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

@alexander-beedie
Copy link
Contributor

alexander-beedie commented Apr 8, 2024

The WITH box is checked, does that mean it's supposed to be done?

We've had basic (non-recursive) CTE support1 for about a year now 😉👌
(Though the SQL interface remains behind the primary expression interface).

Footnotes

  1. https://github.com/pola-rs/polars/pull/8208

@lostmygithubaccount
Copy link
Member Author

hi @alexander-beedie! I did ask here, but while you're here do you know if Polars is following any specific SQL dialect or doing its own thing? I'm mainly curious 1) if/when the SQL interface would catch up to the dataframe interface and 2) if it'd be easy to use SQLGlot via one of the existing dialects to interface with it

@cpcloud
Copy link
Member

cpcloud commented Apr 9, 2024

Polars doesn't seem to handle quoted identifiers. That's the main reason for the extremely limited support for SQL operations using the polars backend in Ibis.

@alexander-beedie
Copy link
Contributor

alexander-beedie commented Apr 9, 2024

hi @alexander-beedie! I did ask here, but while you're here do you know if Polars is following any specific SQL dialect or doing its own thing?

@lostmygithubaccount: We aim to follow PostgreSQL for the majority of standard SQL syntax; not everything fits there of course, so then we try to obey the "principle of least surprise" ;)

(For example, there is no standard PostgreSQL function for skew - I was just asked if we could expose that via the SQL interface, and I note that DuckDB uses "SKEWNESS" here, so the chances are that I'll follow their lead for that rather than reinvent the wheel - SQL is fragmented enough already!)

I'm mainly curious 1) if/when the SQL interface would catch up to the dataframe interface and

No fixed target for that, but it's continuing to build out - yesterday's release added a few more functions (greatest, least, median) and another table operation (truncate), for example. However, the expression interface will always be the primary one. SQL support is unquestionably a nice value-add, but it's not Polars' main focus.

If you are being held back by lack of some specific function/syntax though, please raise an Issue directly with us, as that can help prioritise what gets added next 👍

@alexander-beedie
Copy link
Contributor

alexander-beedie commented Apr 9, 2024

Polars doesn't seem to handle quoted identifiers. That's the main reason for the extremely limited support for SQL operations using the polars backend in Ibis.

@cpcloud: Sure we do; got a specific instance where it goes wrong? If so, please let us know - if there's a bug but nobody reports it, we'll have quite a hard time fixing it. Feel free to tag me in an Issue/bug-report so I can take a look ;)

For example, the following query uses quoted table, column, and alias identifiers without issue:

df = pl.DataFrame({
    "name": ["John","Mark","Steve","Bob"],
    "surname": ["Smith","Jones","Smith","Smith"],
})

with pl.SQLContext(employees=df) as ctx:
    res = ctx.execute(
        """
        SELECT "employees"."surname", COUNT(*) AS "n" 
        FROM "employees" 
        GROUP BY "surname"
        """
    ).collect()

    # shape: (2, 2)
    # ┌─────────┬─────┐
    # │ surname ┆ n   │
    # │ ---     ┆ --- │
    # │ str     ┆ u32 │
    # ╞═════════╪═════╡
    # │ Jones   ┆ 1   │
    # │ Smith   ┆ 3   │
    # └─────────┴─────┘

@cpcloud
Copy link
Member

cpcloud commented Apr 9, 2024

@alexander-beedie Here you go:

In [3]: import polars as pl
   ...:
   ...: df = pl.DataFrame(
   ...:     {
   ...:         "name": ["John", "Mark", "Steve", "Bob"],
   ...:         "surname": ["Smith", "Jones", "Smith", "Smith"],
   ...:     }
   ...: )
   ...:
   ...: with pl.SQLContext(employees=df) as ctx:
   ...:     res = ctx.execute(
   ...:         """
   ...:         with "t" as (SELECT "employees"."surname", COUNT(*) AS "n"
   ...:         FROM "employees"
   ...:         GROUP BY "surname")
   ...:         SELECT * FROM "t"
   ...:         """
   ...:     ).collect()
---------------------------------------------------------------------------
ComputeError                              Traceback (most recent call last)
Cell In[3], line 11
      3 df = pl.DataFrame(
      4     {
      5         "name": ["John", "Mark", "Steve", "Bob"],
      6         "surname": ["Smith", "Jones", "Smith", "Smith"],
      7     }
      8 )
     10 with pl.SQLContext(employees=df) as ctx:
---> 11     res = ctx.execute(
     12         """
     13         with "t" as (SELECT "employees"."surname", COUNT(*) AS "n"
     14         FROM "employees"
     15         GROUP BY "surname")
     16         SELECT * FROM "t"
     17         """
     18     ).collect()

File /nix/store/h2jg4as44r95irklf5f7s711y88vm1x2-python3-3.12.2-env/lib/python3.12/site-packages/polars/sql/context.py:268, in SQLContext.execute(self, query, eager)
    190 def execute(self, query: str, eager: bool | None = None) -> LazyFrame | DataFrame:
    191     """
    192     Parse the given SQL query and execute it against the registered frame data.
    193
   (...)
    266     └────────┴─────────────┴─────────┘
    267     """
--> 268     res = wrap_ldf(self._ctxt.execute(query))
    269     return res.collect() if (eager or self._eager_execution) else res

ComputeError: relation 't' was not found

@alexander-beedie
Copy link
Contributor

alexander-beedie commented Apr 9, 2024

Got it; fixed with a one-liner - exclusively impacted CTE identifier names.
Landing the PR now: pola-rs/polars#15564 👌

@alexander-beedie
Copy link
Contributor

hi @alexander-beedie! I did ask here

Also, will follow up on this one; I just delved into the docs and was faintly horrified to see that PostgreSQL does actually claim to support grouping by1 ordinal position values - which seems genuinely horrible (not self-documenting, and presumably shockingly fragile, but... valid syntax nevertheless!) 🤣

Footnotes

  1. https://www.postgresql.org/docs/current/sql-select.html#SQL-GROUPBY

@cpcloud
Copy link
Member

cpcloud commented Apr 9, 2024

hi @alexander-beedie! I did ask here

Also, will follow up on this one; I just delved into the docs and was faintly horrified to see that PostgreSQL does actually claim to support grouping by1 ordinal position values - which seems genuinely horrible (not self-documenting, and presumably shockingly fragile, but... valid syntax nevertheless!) 🤣

Footnotes

  1. postgresql.org/docs/current/sql-select.html#SQL-GROUPBY

It's a nice syntax for SQL-generating libraries to avoid repeating large group by expressions. It's used a lot more than you'd think.

@alexander-beedie
Copy link
Contributor

alexander-beedie commented Apr 11, 2024

FYI: the upcoming 0.20.20 (which will release in the near future) has fixes for both issues (quoted CTE identifier names and ordinal/integer "GROUP BY" clauses). I'll take a look at adding ordinal support in "ORDER BY" clauses in the near future too; looks pretty straightforward.

Update: released...
https://github.com/pola-rs/polars/releases/tag/py-0.20.20

@cpcloud
Copy link
Member

cpcloud commented Apr 14, 2024

Version bump here: #8960.

Will get shipped in the next Ibis release in a week or so.

@alexander-beedie
Copy link
Contributor

FYI: ordinal support for the ORDER BY clause will be landing in the next release:
pola-rs/polars#16745

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements polars The polars backend
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants