Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airframe-sql: Resolve columns from CTE inside AliasedRelation #3057

Merged
merged 3 commits into from
Jul 12, 2023

Conversation

takezoe
Copy link
Member

@takezoe takezoe commented Jul 11, 2023

No description provided.

@github-actions github-actions bot added the bug label Jul 11, 2023
@codecov
Copy link

codecov bot commented Jul 11, 2023

Codecov Report

Merging #3057 (f387440) into master (93034eb) will increase coverage by 0.00%.
The diff coverage is 84.00%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3057   +/-   ##
=======================================
  Coverage   82.80%   82.81%           
=======================================
  Files         350      350           
  Lines       14655    14666   +11     
  Branches     2459     2448   -11     
=======================================
+ Hits        12135    12145   +10     
- Misses       2520     2521    +1     
Impacted Files Coverage Δ
...a/wvlet/airframe/parquet/ParquetQueryPlanner.scala 89.36% <ø> (ø)
...scala/wvlet/airframe/sql/parser/SQLGenerator.scala 91.27% <ø> (ø)
...in/scala/wvlet/airframe/sql/model/Expression.scala 76.86% <72.22%> (-0.05%) ⬇️
...ala/wvlet/airframe/sql/analyzer/TypeResolver.scala 93.64% <81.25%> (ø)
...la/wvlet/airframe/sql/analyzer/SQLAnonymizer.scala 73.17% <100.00%> (ø)
...n/scala/wvlet/airframe/sql/model/LogicalPlan.scala 87.62% <100.00%> (+0.04%) ⬆️
.../scala/wvlet/airframe/sql/model/ResolvedPlan.scala 68.00% <100.00%> (+2.78%) ⬆️
...ala/wvlet/airframe/sql/parser/SQLInterpreter.scala 93.40% <100.00%> (ø)

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93034eb...f387440. Read the comment docs.

@takezoe takezoe force-pushed the fix-resolve-cte-in-alias branch 2 times, most recently from 946258b to 707e075 Compare July 11, 2023 13:32
@takezoe takezoe marked this pull request as ready for review July 11, 2023 13:44
@takezoe takezoe force-pushed the fix-resolve-cte-in-alias branch from 707e075 to adfea27 Compare July 11, 2023 13:56
@takezoe takezoe changed the title Fix resolving CTE columns in AliasedRelation airframe-sql: Resolve columns from CTE inside AliasedRelation Jul 11, 2023
@@ -380,8 +380,7 @@ object TypeResolver extends LogSupport {
a.copy(expr = resolved)
}
case SingleColumn(a: Attribute, qualifier, _) if a.resolved =>
// Optimizes the nested attributes, but preserves qualifier in the parent
a.setQualifierIfEmpty(qualifier)
a.withQualifier(qualifier)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to keep the qualifier if SingleColumn has it.

Copy link
Member

@xerial xerial Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, setQualifierIfEmpty method becomes unnecessary (no other usage). Let's remove it.

But as discussed below, force setting the qualifier might not be the correct fix

test("resolve CTE in AliasedRelation") {
val p1 = analyze("with t1 as (select id from A) select id from (select id from t1) t2")
p1.outputAttributes shouldMatch { case List(col: ResolvedAttribute) =>
col.fullName shouldBe "id"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if adding a qualifier to columns only when column is referred inside function call is expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my current understanding, qualifier needs to be added only when the user adds it in the input SQL. If the column cannot be properly resolved due to lack of the qualifier not given in the SQL, there should be other types of bugs in TypeResolver side.

val p2 = analyze("with t1 as (select id from A) select count(id) from (select id from t1) t2")
p2.outputAttributes shouldMatch {
case List(SingleColumn(FunctionCall("count", Seq(col: ResolvedAttribute), _, _, _, _), _, _)) =>
col.fullName shouldBe "t2.id"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the fix, this was t1.id so it couldn't be resolved.

Copy link
Member

@xerial xerial Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the right expectation here is 'id' can be resolved without any qualifier with the source column A.id. I guess outputAttributes method or some TypeResolver rules need to be fixed to uniquely resolve id column from the sub query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct expectation is:

col.fullName shouldBe "id" // `t1.id` is wrongly resolved here

Copy link
Member

@xerial xerial Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the resolved plan with trace-level log:

wvlet.airframe.sql.analyzer.TypeResolverTest=trace

CTERelationRef already adds t1.id as a qualifier, so I think CTERelation resolution part is wrong:

[original plan]
[Query]: (id:?) => (?:long)
  [WithQuery]:  => (id:?)
    - ResolvedIdentifier(Id(t1))
    [Project]:  => (id:?)
      - id:? := Id(id)
      [TableRef]
        - A
  [Project]: (id:?) => (?:long)
    - ?:long := FunctionCall(count, Id(id), distinct:false, window:None)
    [AliasedRelation]:  => (id:?)
      - ResolvedIdentifier(Id(t2))
      [Project]:  => (id:?)
        - id:? := Id(id)
        [TableRef]
          - t1

[resolved plan]
[Query]: (id:long) => (?:long)
  [WithQuery]: (id:long, name:string) => (id:long)
    - ResolvedIdentifier(Id(t1))
    [Project]: (id:long, name:string) => (id:long)
      - *id:long <- A.id
      [TableScan] default.A:  => (id:long, name:string)
  [Project]: (id:long) => (?:long)
    - ?:long := FunctionCall(count, *t1.id:long <- A.id, distinct:false, window:None)
    [AliasedRelation]: (id:long) => (id:long)
      - ResolvedIdentifier(Id(t2))
      [Project]: (id:long) => (id:long)
        - *id:long <- A.id
        [CTERelationRef]:  => (id:long)
          - *t1.id:long <- A.id

resolveTableRef rule should not set the qualifier for CTE:

cte.outputAttributes.map(_.withQualifier(qname.fullName)),

Copy link
Member

@xerial xerial Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After fixing not to add qualifier to cte.outputAttributes, it seems resolveRegularRelation adds unnecessary qualifier:

2023-07-11 14:17:32.139-0700 trace [RewriteRule]
transformed with resolveRegularRelation:
[before]
[Project]: (id:?) => (?:long)
  - ?:long := FunctionCall(count, Id(id), distinct:false, window:None)
  [AliasedRelation]: (id:long) => (id:?)
    - ResolvedIdentifier(Id(t2))
    [Project]: (id:long) => (id:?)
      - id:? := Id(id)
      [CTERelationRef]:  => (id:long)
        - *id:long <- A.id

[after]
[Project]: (id:long) => (?:long)
  - ?:long := FunctionCall(count, *t2.id:long <- A.id, distinct:false, window:None) <===== shouldBe id:long <- A.id
  [AliasedRelation]: (id:long) => (id:long)
    - ResolvedIdentifier(Id(t2))
    [Project]: (id:long) => (id:long)
      - id:long := *id:long <- A.id
      [CTERelationRef]:  => (id:long)
        - *id:long <- A.id
  - (RewriteRule.scala:32)

Copy link
Member

@xerial xerial Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely, the current implementation of AliasedRelation.outputAttribuets is wrong, because it always adds qualifiers regardless of the context

@xerial
Copy link
Member

xerial commented Jul 11, 2023

ok. I understand the root cause. Currently, AliasedRelation always returns outputAttributes with qualifiers coming from the table alias name, but this implementation logic is wrong.

Instead of using the qualifier for resolving columns, which should be used only when the original SQL has it, TypeResolver should understand the context table name alias, say A, and find the referenced columns like A.id. Attribute or ResolvedAttribute class may need to have tableNameAlias:Option[String] parameter.

Removing withQualifier code in AliasedRelation will be like this 345e7df, but we also need to fix TypeResolver and Attribute classes to pass the unit tests.

For example:

 - resolve set operations.analyzer.TypeResolverTest 
     - ru3: resolve union with column alias and qualifier 301.99ms << failed: Found unresolved expressions in:
[sql]
select q1.p1 from (select id as p1 from A union all select id as p1 from B) q1
[plan]
[Project]: (p1:long) => (p1:?)
  - UnresolvedAttribute(q1.p1) <--- This column should be found in the AliasedRelation.outputAttributes
  [AliasedRelation]: (id:long, name:string, id:long, name:string) => (p1:long) <---- This outputAttributes should remember alias table name q1 without using the qualifier parameter
    - ResolvedIdentifier(Id(q1))
    [Union]: (id:long, name:string, id:long, name:string) => (p1:long)
      [Project]: (id:long, name:string) => (p1:long)
        - <p1> := *id:long <- A.id
        [TableScan] default.A:  => (id:long, name:string)
      [Project]: (id:long, name:string) => (p1:long)
        - <p1> := *id:long <- B.id
        [TableScan] default.B:  => (id:long, name:string)

Test case should look like:

col.fullName shouldBe "(column name without qualifier)"
col.tableAlias shouldBe Some("(alias name)")

@takezoe
Copy link
Member Author

takezoe commented Jul 11, 2023

Sounds like Attributes inside AliasedRelation need to have an alias of AliasedRelation somehow to make it possible to resolve them.

@takezoe takezoe force-pushed the fix-resolve-cte-in-alias branch from 30caaad to 3a1f979 Compare July 12, 2023 01:16
@takezoe takezoe force-pushed the fix-resolve-cte-in-alias branch from 3a1f979 to 3f1638f Compare July 12, 2023 01:25
@xerial xerial added the breaking Breaking changes label Jul 12, 2023
}
List(a.copy(columns = Some((qualifier match {
case Some(q) => allColumns.filter(_.qualifier.contains(q))
case Some(q) => allColumns.filter(c => c.qualifier.contains(q) || c.tableAlias.contains(q))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. This is the important change in this PR :)

Copy link
Member Author

@takezoe takezoe Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather, c.qualifier.contains(q) seems no longer necessary here. Will raise another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking changes bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants