Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(optimizer): optimize pivots #1617

Merged
merged 12 commits into from
May 16, 2023
Merged

Feat(optimizer): optimize pivots #1617

merged 12 commits into from
May 16, 2023

Conversation

georgesittas
Copy link
Collaborator

@georgesittas georgesittas commented May 14, 2023

Fixes #1449

This PR introduces optimizer logic for handling the PIVOT operator, i.e. explode & qualify the columns of the table it produces where necessary. It's a first draft towards the general solution, which is more difficult because we may have multiple JOINs and / or PIVOTs, UNPIVOTs chained together in non-trivial ways.

We could also try to transform PIVOT operators using exp.Case expressions. This seems a bit less straightforward to me if we want to get it right in all cases. For example:

  1. What do we GROUP BY if there are additional columns besides what's referenced in the exp.Pivot expression?
  2. Can we always map the aggregations trivially into projections? How would we handle stuff like COUNT(*)?
  3. How does this transformation behave when there are JOINs and / or other (UN)PIVOT operators applied?

The major advantage of this approach, though, would be that we'd get back a canonical query without PIVOTs.

References:

@georgesittas
Copy link
Collaborator Author

georgesittas commented May 14, 2023

Left some comments on the PR for clarity (they're now marked as resolved), let me know if something's not clear. Interested to hear alternatives, would love to simplify this somehow.

@georgesittas
Copy link
Collaborator Author

georgesittas commented May 15, 2023

TODO:

  • Add transformation to remove pivot alias for Spark

sqlglot/optimizer/scope.py Outdated Show resolved Hide resolved
@georgesittas
Copy link
Collaborator Author

@tobymao made a few more changes, let me know if you take a look.

  • Moved the unqualify_pivot_columns transform to spark since it's the only dialect using it.
  • Created an _unalias_pivot transform just for spark that removes table aliases from pivots.
  • Improved handling of the alias argument in subquery.
  • Added more spark tests & new optimizer test that demonstrates the above transformations.

@georgesittas georgesittas merged commit 4b1aa02 into main May 16, 2023
@georgesittas georgesittas deleted the jo/pivot_optimization branch May 16, 2023 13:08
adrianisk pushed a commit to adrianisk/sqlglot that referenced this pull request Jun 21, 2023
* Feat(optimizer): optimize pivots

* Fixup

* Simplify

* Cleanup

* Fix pivot sql generation

* Fixed snowflake pivot column names, add another optimizer test

* Fixed issue with pivoted cte source, added bigquery test

* Factor out some computations

* Cleanup

* Add transform to unalias pivot in spark, more tests

* Typo

* Comment fixup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants