-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
airframe-sql: Resolve columns from CTE inside AliasedRelation #3057
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #3057 +/- ##
=======================================
Coverage 82.80% 82.81%
=======================================
Files 350 350
Lines 14655 14666 +11
Branches 2459 2448 -11
=======================================
+ Hits 12135 12145 +10
- Misses 2520 2521 +1
... and 1 file with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
946258b
to
707e075
Compare
707e075
to
adfea27
Compare
@@ -380,8 +380,7 @@ object TypeResolver extends LogSupport { | |||
a.copy(expr = resolved) | |||
} | |||
case SingleColumn(a: Attribute, qualifier, _) if a.resolved => | |||
// Optimizes the nested attributes, but preserves qualifier in the parent | |||
a.setQualifierIfEmpty(qualifier) | |||
a.withQualifier(qualifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to keep the qualifier if SingleColumn
has it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, setQualifierIfEmpty method becomes unnecessary (no other usage). Let's remove it.
But as discussed below, force setting the qualifier might not be the correct fix
test("resolve CTE in AliasedRelation") { | ||
val p1 = analyze("with t1 as (select id from A) select id from (select id from t1) t2") | ||
p1.outputAttributes shouldMatch { case List(col: ResolvedAttribute) => | ||
col.fullName shouldBe "id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if adding a qualifier to columns only when column is referred inside function call is expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my current understanding, qualifier needs to be added only when the user adds it in the input SQL. If the column cannot be properly resolved due to lack of the qualifier not given in the SQL, there should be other types of bugs in TypeResolver side.
val p2 = analyze("with t1 as (select id from A) select count(id) from (select id from t1) t2") | ||
p2.outputAttributes shouldMatch { | ||
case List(SingleColumn(FunctionCall("count", Seq(col: ResolvedAttribute), _, _, _, _), _, _)) => | ||
col.fullName shouldBe "t2.id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the fix, this was t1.id
so it couldn't be resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the right expectation here is 'id' can be resolved without any qualifier with the source column A.id. I guess outputAttributes method or some TypeResolver rules need to be fixed to uniquely resolve id
column from the sub query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the correct expectation is:
col.fullName shouldBe "id" // `t1.id` is wrongly resolved here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the resolved plan with trace-level log:
wvlet.airframe.sql.analyzer.TypeResolverTest=trace
CTERelationRef already adds t1.id as a qualifier, so I think CTERelation resolution part is wrong:
[original plan]
[Query]: (id:?) => (?:long)
[WithQuery]: => (id:?)
- ResolvedIdentifier(Id(t1))
[Project]: => (id:?)
- id:? := Id(id)
[TableRef]
- A
[Project]: (id:?) => (?:long)
- ?:long := FunctionCall(count, Id(id), distinct:false, window:None)
[AliasedRelation]: => (id:?)
- ResolvedIdentifier(Id(t2))
[Project]: => (id:?)
- id:? := Id(id)
[TableRef]
- t1
[resolved plan]
[Query]: (id:long) => (?:long)
[WithQuery]: (id:long, name:string) => (id:long)
- ResolvedIdentifier(Id(t1))
[Project]: (id:long, name:string) => (id:long)
- *id:long <- A.id
[TableScan] default.A: => (id:long, name:string)
[Project]: (id:long) => (?:long)
- ?:long := FunctionCall(count, *t1.id:long <- A.id, distinct:false, window:None)
[AliasedRelation]: (id:long) => (id:long)
- ResolvedIdentifier(Id(t2))
[Project]: (id:long) => (id:long)
- *id:long <- A.id
[CTERelationRef]: => (id:long)
- *t1.id:long <- A.id
resolveTableRef rule should not set the qualifier for CTE:
airframe/airframe-sql/src/main/scala/wvlet/airframe/sql/analyzer/TypeResolver.scala
Line 225 in adfea27
cte.outputAttributes.map(_.withQualifier(qname.fullName)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After fixing not to add qualifier to cte.outputAttributes, it seems resolveRegularRelation adds unnecessary qualifier:
2023-07-11 14:17:32.139-0700 trace [RewriteRule]
transformed with resolveRegularRelation:
[before]
[Project]: (id:?) => (?:long)
- ?:long := FunctionCall(count, Id(id), distinct:false, window:None)
[AliasedRelation]: (id:long) => (id:?)
- ResolvedIdentifier(Id(t2))
[Project]: (id:long) => (id:?)
- id:? := Id(id)
[CTERelationRef]: => (id:long)
- *id:long <- A.id
[after]
[Project]: (id:long) => (?:long)
- ?:long := FunctionCall(count, *t2.id:long <- A.id, distinct:false, window:None) <===== shouldBe id:long <- A.id
[AliasedRelation]: (id:long) => (id:long)
- ResolvedIdentifier(Id(t2))
[Project]: (id:long) => (id:long)
- id:long := *id:long <- A.id
[CTERelationRef]: => (id:long)
- *id:long <- A.id
- (RewriteRule.scala:32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely, the current implementation of AliasedRelation.outputAttribuets is wrong, because it always adds qualifiers regardless of the context
a.withQualifier(alias.value) |
ok. I understand the root cause. Currently, AliasedRelation always returns outputAttributes with qualifiers coming from the table alias name, but this implementation logic is wrong. Instead of using the qualifier for resolving columns, which should be used only when the original SQL has it, TypeResolver should understand the context table name alias, say A, and find the referenced columns like A.id. Attribute or ResolvedAttribute class may need to have Removing withQualifier code in AliasedRelation will be like this 345e7df, but we also need to fix TypeResolver and Attribute classes to pass the unit tests. For example:
Test case should look like: col.fullName shouldBe "(column name without qualifier)"
col.tableAlias shouldBe Some("(alias name)") |
Sounds like |
30caaad
to
3a1f979
Compare
3a1f979
to
3f1638f
Compare
} | ||
List(a.copy(columns = Some((qualifier match { | ||
case Some(q) => allColumns.filter(_.qualifier.contains(q)) | ||
case Some(q) => allColumns.filter(c => c.qualifier.contains(q) || c.tableAlias.contains(q)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. This is the important change in this PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather, c.qualifier.contains(q)
seems no longer necessary here. Will raise another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.