Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support "BY NAME" qualifier for SQL "INTERSECT" and "EXCEPT" set ops #17835

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Jul 24, 2024

Updates sqlparser-rs to 0.49 to take advantage of some PRs I've made upstream.

  • Adds support for the BY NAME qualifier across all set ops; like the existing support for this in UNION, the modifier aligns the second frame to the first by column name (in much the same way as "diagonal" frame concat).
  • Follows-up feat: Support Struct field selection in the SQL engine, RENAME and REPLACE select wildcard options #17109 by enabling SELECT * REPLACE … RENAME … query patterns (previously you could not use both REPLACE and RENAME in the same sequence of SELECT modifiers - with the upstream fix available, now you can).
  • A few other minor internal improvements/tidy-ups.

Example

Setup:

import polars as pl

df1 = pl.DataFrame({
    "x": [1, 9, 1, 1],
    "y": [2, 3, 4, 4],
    "z": [5, 5, 5, 5],
})
df2 = pl.DataFrame({
    "y": [2, None, 4],
    "w": ["?", "!", "%"],
    "z": [7, 6, 5],
    "x": [1, 9, 1],
})

Use of the BY NAME qualifier automatically aligns the columns in the second DataFrame to those in the first (note that we also support the PostgreSQL TABLE tbl shortcut, which is equivalent to SELECT * FROM tbl):

df_except = pl.sql(
  "SELECT x, y, z FROM df1 EXCEPT BY NAME TABLE df2",
).collect()
# shape: (2, 3)
# ┌─────┬─────┬─────┐
# │ x   ┆ y   ┆ z   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 2   ┆ 5   │
# │ 9   ┆ 3   ┆ 5   │
# └─────┴─────┴─────┘
df_intersect = pl.sql(
  "SELECT * FROM df1 INTERSECT BY NAME TABLE df2",
).collect()
# shape: (1, 3)
# ┌─────┬─────┬─────┐
# │ x   ┆ y   ┆ z   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 1   ┆ 4   ┆ 5   │
# └─────┴─────┴─────┘

Reference

sqplarser-rs updates:

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jul 24, 2024
@alexander-beedie alexander-beedie added the A-sql Area: Polars SQL functionality label Jul 24, 2024
Copy link

codecov bot commented Jul 24, 2024

Codecov Report

Attention: Patch coverage is 95.65217% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.51%. Comparing base (4c19115) to head (8fcda9b).
Report is 9 commits behind head on main.

Files Patch % Lines
crates/polars-sql/src/context.rs 88.88% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #17835   +/-   ##
=======================================
  Coverage   80.51%   80.51%           
=======================================
  Files        1503     1503           
  Lines      197038   197076   +38     
  Branches     2805     2805           
=======================================
+ Hits       158641   158681   +40     
+ Misses      37876    37874    -2     
  Partials      521      521           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit 8373cdb into pola-rs:main Jul 25, 2024
38 checks passed
@alexander-beedie alexander-beedie deleted the sql-parser-update branch July 25, 2024 16:57
@alamb
Copy link

alamb commented Jul 27, 2024

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants