Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sql): replace CTEs within CTEs #8572

Merged
merged 3 commits into from
Mar 8, 2024
Merged

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Mar 6, 2024

While the CTE extraction worked properly, the CTEs haven't been replaced in the already extracted CTEs meaning that we didn't recursively replace.

@kszucs kszucs force-pushed the cte-fix branch 2 times, most recently from 560142d to d9300ff Compare March 6, 2024 22:54
@cpcloud
Copy link
Member

cpcloud commented Mar 7, 2024

This looks like it might help with #8484.

@kszucs Are the remaining failures due to needing to regen the SQL snapshots for those backends?

@cpcloud cpcloud added this to the 9.0 milestone Mar 7, 2024
return CTE(new) if node in ctes else new

result = simplified.replace(wrap)
ctes = reversed([cte.parent for cte in result.find(CTE)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the find + reversed necessary to preserve topological order? If so, can you add a short comment about it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find traverse in DFS/BFS not topological. Replace do traverse in topological order so we could extract the rewritten CTEs in the closure, but my first naive attempt didn't work and I was lazy inspecting it. We can defer it to a follow-up.

Copy link
Member

@cpcloud cpcloud Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will look through the snapshots again to make sure CTEs are occurring in topological order.

@cpcloud cpcloud added bug Incorrect behavior inside of ibis refactor Issues or PRs related to refactoring the codebase labels Mar 7, 2024
gforsyth pushed a commit that referenced this pull request Mar 7, 2024
This PR gets benchmarks passing again. Nesting levels were increased by
#8414.

Unfortunately, SQLGlot's SQL generation algorithm (but not its parsing
algorithm) is recursive.

This means that it cannot handle large nesting levels of select
statements.

Pre-SQLGlot, Ibis used to handle much larger nesting levels. This
functionality was lost in the sqlglot refactor.

Ultimately, someone needs to address this upstream in sqlglot by
converting the generation algorithm to an iterative
one.

I believe we should gain a little bit back after #8572 is merged, since
fewer select statements will be generated.
@kszucs
Copy link
Member Author

kszucs commented Mar 7, 2024

Are the remaining failures due to needing to regen the SQL snapshots for those backends?

Yes, thanks for updating those snapshots!

result = simplified.replace(subs)
ctes = set(extract_ctes(simplified))

def wrap(node, _, **kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to add this as comment, but the the problem with the previous solution was that replace(dict) matches on the original node and replaces the given value, but the ctes in the subs mapping are the original relation objects.

Instead we need to do the following: once we encounter a relation which is occurring more than once (in ctes) we need to wrap then in a CTE operation, which must propagate upwards in topological order, hence we are using a custom replacer which matches on the original (not the recreated) node.

@cpcloud
Copy link
Member

cpcloud commented Mar 8, 2024

Going to see whether this helps with #8484.

@cpcloud
Copy link
Member

cpcloud commented Mar 8, 2024

Doesn't really help with #8484.

@cpcloud cpcloud merged commit 182b6a5 into ibis-project:main Mar 8, 2024
74 checks passed
@cpcloud cpcloud deleted the cte-fix branch March 8, 2024 12:06
@kszucs kszucs self-assigned this Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis refactor Issues or PRs related to refactoring the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants