Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL ORDER BY Fails on list[f64] #19758

Open
2 tasks done
BSalita opened this issue Nov 13, 2024 · 5 comments
Open
2 tasks done

SQL ORDER BY Fails on list[f64] #19758

BSalita opened this issue Nov 13, 2024 · 5 comments
Labels
A-sql Area: Polars SQL functionality bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@BSalita
Copy link

BSalita commented Nov 13, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

data = {
    'Direction': ['N', 'S', 'E', 'W', 'N', 'S', 'E', 'W', 'N', 'S', 'E', 'W'],
    'Pcts_NS': [60.5, 55.2, 42.3, 38.7, 58.9, 57.1, 40.2, 35.8, 62.3, 54.8, 41.5, 39.2],
    'Pcts_EW': [39.5, 44.8, 57.7, 61.3, 41.1, 42.9, 59.8, 64.2, 37.7, 45.2, 58.5, 60.8]
}

df = pl.DataFrame(data)

sql_query = "SELECT CASE WHEN Direction IN ('N', 'S') THEN AVG(Pcts_NS) ELSE AVG(Pcts_EW) END AS Avg_Pct FROM self GROUP BY Direction ORDER BY Avg_Pct DESC"

# first show duckdb is liking the query
con = duckdb.connect()
con.register('self', df)
sql_result = con.execute(sql_query).df()
print(sql_result)

# now try polars sql() which will fail with: InvalidOperationError: `sort_with` operation not supported for dtype `list[f64]`
sql_result = df.sql(sql_query)
print(sql_result)

Log output

polars.exceptions.InvalidOperationError: `sort_with` operation not supported for dtype `list[f64]`

Issue description

  1. Here's how I interpret the error message. Apparently, ORDER BY is unable to sort list[f64]. I assume this is an oversight as I can't see any special difficulty in such a sort. duckdb works as expected.

  2. Can errors such as 'sort with' be changed to the SQL counterpart 'ORDER BY operation not supported ...

  3. The original issue, not yet reported, was that the original SQL query would hang (never return). I whittled down the original SQL to what's shown here. So there may be another issue, the hang issue, somewhere in the mix.

Expected behavior

Expecting a result of a sorted df.

Installed versions

--------Version info---------
Polars:              1.12.0
Index type:          UInt32
Platform:            Windows-11-10.0.26100-SP0
Python:              3.12.7 | packaged by conda-forge | (main, Oct  4 2024, 15:47:54) [MSC v.1941 64 bit (AMD64)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               4.2.2
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.10.0
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                2.0.2
openpyxl             <not installed>
pandas               2.2.3
pyarrow              17.0.0
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           2.0.35
torch                2.4.1+cpu
xlsx2csv             <not installed>
xlsxwriter           3.2.0
@BSalita BSalita added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 13, 2024
@cmdlineluser
Copy link
Contributor

I think the underlying issue is that Polars doesn't let you sort list columns as of yet:

df = pl.DataFrame({"x": [[4, 3], [1, 2]]})
df.sort("x")
# InvalidOperationError: `sort_with` operation not supported for dtype `list[i64]`

@BSalita
Copy link
Author

BSalita commented Nov 13, 2024

Yeah, looks like it's probably the same issue. I'm surprised because looking at it from the SQL viewpoint, doing an ORDER BY is completely ordinary.

I'll just have to stick with duckdb until Polars SQL gets much more robust.

@alexander-beedie alexander-beedie added the A-sql Area: Polars SQL functionality label Nov 14, 2024
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Nov 14, 2024

Actually it looks like something a bit different - we shouldn't have lists of lists in the group result. If I omit the ORDER BY clause you can see the groups haven't been unpacked to the expected scalar result... Could be an unexpected interaction with the aggregates in the case/when 🤔

┌───────────┬─────────────────────────────────┐
│ Direction ┆ Avg_Pct                         │
│ ---       ┆ ---                             │
│ str       ┆ list[f64]                       │
╞═══════════╪═════════════════════════════════╡
│ S         ┆ [55.7, 55.7, 55.7]              │
│ W         ┆ [62.1, 62.1, 62.1]              │
│ E         ┆ [58.666667, 58.666667, 58.6666… │
│ N         ┆ [60.566667, 60.566667, 60.5666… │
└───────────┴─────────────────────────────────┘

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Nov 14, 2024

FYI, if you write the query like this (with a single outer AVG, rather than having two inside the CASE/WHEN) then it'll work cleanly (and match DuckDB):

df.sql("""
  SELECT 
    Direction,
    AVG(
      CASE WHEN Direction IN ('N', 'S') THEN Pcts_NS ELSE Pcts_EW END
    ) AS Avg_Pct 
  FROM self 
  GROUP BY Direction 
  ORDER BY Avg_Pct DESC
""")
# shape: (4, 2)
# ┌───────────┬───────────┐
# │ Direction ┆ Avg_Pct   │
# │ ---       ┆ ---       │
# │ str       ┆ f64       │
# ╞═══════════╪═══════════╡
# │ W         ┆ 62.1      │
# │ N         ┆ 60.566667 │
# │ E         ┆ 58.666667 │
# │ S         ┆ 55.7      │
# └───────────┴───────────┘

Digging deeper I can see this isn't a SQL issue, as I can replicate with the standard DataFrame interface → "to be continued..."

@BSalita
Copy link
Author

BSalita commented Nov 16, 2024

Agreed. Looks like a different issue. I've fallen back to duckdb. Thanks for the interesting response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants