-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sql): replace CTEs within CTEs #8572
Conversation
560142d
to
d9300ff
Compare
return CTE(new) if node in ctes else new | ||
|
||
result = simplified.replace(wrap) | ||
ctes = reversed([cte.parent for cte in result.find(CTE)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the find
+ reversed
necessary to preserve topological order? If so, can you add a short comment about it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Find traverse in DFS/BFS not topological. Replace do traverse in topological order so we could extract the rewritten CTEs in the closure, but my first naive attempt didn't work and I was lazy inspecting it. We can defer it to a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will look through the snapshots again to make sure CTEs are occurring in topological order.
This PR gets benchmarks passing again. Nesting levels were increased by #8414. Unfortunately, SQLGlot's SQL generation algorithm (but not its parsing algorithm) is recursive. This means that it cannot handle large nesting levels of select statements. Pre-SQLGlot, Ibis used to handle much larger nesting levels. This functionality was lost in the sqlglot refactor. Ultimately, someone needs to address this upstream in sqlglot by converting the generation algorithm to an iterative one. I believe we should gain a little bit back after #8572 is merged, since fewer select statements will be generated.
Yes, thanks for updating those snapshots! |
result = simplified.replace(subs) | ||
ctes = set(extract_ctes(simplified)) | ||
|
||
def wrap(node, _, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to add this as comment, but the the problem with the previous solution was that replace(dict)
matches on the original node and replaces the given value, but the ctes in the subs
mapping are the original relation objects.
Instead we need to do the following: once we encounter a relation which is occurring more than once (in ctes
) we need to wrap then in a CTE
operation, which must propagate upwards in topological order, hence we are using a custom replacer which matches on the original (not the recreated) node.
Going to see whether this helps with #8484. |
Doesn't really help with #8484. |
While the CTE extraction worked properly, the CTEs haven't been replaced in the already extracted CTEs meaning that we didn't recursively replace.