Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Embedded Coproduct Column Duplication Issue #1604

Merged
merged 1 commit into from
Sep 13, 2019

Conversation

deusaquilus
Copy link
Collaborator

@deusaquilus deusaquilus commented Sep 11, 2019

Fixes #1602

Introduction

The primary challenge of managing sub-queries with row co-products is how to alias columns that have the same name. For example say that we have something like this:

val ps = query[Person]
  .join(query[Person]).on(_.id == _.partnerOf) //Returns a coproduct (Person, Address) i.e. a tuple
  .map { case (pa, pb) => (pa.name, pb.name) }
run(ps)

Running this should return something like this:

select pa.name, pb.name from Person pa join Person pb on pa.id = pb.partnerOf

Now let's say we make a subquery of this:

ps.nested.map { case (paName, pbName) => (paName + "-one", pb + "-two") }

Doing a subquery of this causes a problem. The inner query returns two name columns and SQL cannot determinte one from the other!

-- Two "name" columns coming out of "inner". Which one is which???
select name|'-one', name|'-two' name from (
  select pa.name, pb.name from Person pa join Person pb on pa.id = pb.partnerOf
) inner -- Side Note: Usually this is x or x1 etc... if you have not specified a name for it in the 2nd map

In order to solve this problem, Quill aliases the inner query with an alias which is the appended the tuple property name (i.e. _+(tupleIndex+1)) and the property name. I.e:

-- Two "name" columns coming out of "inner". Which one is which???
select name|'-one', name|'-two' name from (
  select pa.name as _1name, pb.name as _2name from Person pa join Person pb on pa.id = pb.partnerOf
)

This strategy works for tuples but what about if the co-product type is a case-class. This kind of scenario is more difficult to reproduce but as indicated by #1602 it is:

case class Person(id:Int, name:String, partnerOf:Int) extends Embedded
case class Dual(ta: Person, tb: Person)

val qr1 = quote { query[Person] }

val q = quote {
  qr1.join(qr1).on((a, b) => a.id == b.partnerOf).nested.map(both => both match { case (a, b) => Dual(a, b) }).distinct.nested
}
run(q)

This yields:

SELECT x.id, x.name, x.partner_of, x.id, x.name, x.partner_of FROM (SELECT DISTINCT both._1id AS taid, both._1name AS taname, both._1partner_of AS tapartner_of, both._2id AS tbid, both._2name AS tbname, both._2partner_of AS tbpartner_of FROM (SELECT a.id AS _1id, a.name AS _1name, a.partner_of AS _1partner_of, b.id AS _2id, b.name AS _2name, b.partner_of AS _2partner_of FROM person a INNER JOIN person b ON a.id = b.partner_of) AS both) AS x

Note how the "name" columns as well as others appear twice (and it also does not correspond to the inner aliast taname). This is because only tuple properties (e.g. _1, _2 ...) are only appended to sub-query aliases as opposed to Ad-Hoc case class fields.
That is to say Property(Property(x, "_1"), "name") is expressed as "_1name" in SqlIdiom but Property(Property(x, "pa"), "name") just becomes "name".
This is by design since we rely on the skipping of nested property objects in order to get correct SQL tokenization behavior in embedded entities. As the next section will explain.

Sub-Properties and Embedded Entities

Typically the way embedded entities are expressed is as properties e.g. with a schema like this:

case class Emb(a: Int, b:Int) extends Embedded
case class Parent(id:Int, emb:Emb)
val q = quote { query[Parent] }

... selecting the field a from Emb inside of Parent would be p.map(p => p.emb.a). This translates into Property(Property(p, "emb"), "a") in the AST. Now if we were to treat both tuple index (i.e. _1, _2, etc...) inner-properties, and all other inner-properties the same way, the query q (above) would become:

select id, emba, embb from Parent

... which is clearly incorrect.

The Solution

Now, since we can no longer rely on the tuple-index format (i.e. _1, _2 etc...) to know whether a property should be tokenized, we need a way to identify which properties belong to embedded entities and which ones belong to Ad-Hoc case classes. Since this kind of information is not really related to AST transformations but rather is only used for SQL tokenization, it is a good candidate for AST Opinions.

AST Opinions were first introduced in <> to handle situations where a property is not subject to NamingStrategy manipulation because it is defined in a querySchema (i.e. schema meta). Following, a similar pattern, which Properties are visible or not is introduced to Property opinions (extractable and constructible via Property.Opinionated), the actual value of this property is set in the parser (i.e. Parsing.scala). For example Property.Opinionated(Property.Opinionated(x, "emb", ByStrategy, Hidden), "a", ByStrategy, Visible) (*) tells us that the property emb is hidden and should not be expressed as part of SQL Tokenization but DOES need to be used for aliasing.

Due to these changes, the query mentioned above:

case class Person(id:Int, name:String, partnerOf:Int) extends Embedded
case class Dual(ta: Person, tb: Person)

val qr1 = quote { query[Person] }

val q = quote {
  qr1.join(qr1).on((a, b) => a.id == b.partnerOf).nested.map(both => both match { case (a, b) => Dual(a, b) }).distinct.nested
}
run(q)

Now produces the correct SQL:

SELECT x.taid, x.taname, x.tapartner_of, x.tbid, x.tbname, x.tbpartner_of FROM (SELECT DISTINCT both._1id AS taid, both._1name AS taname, both._1partner_of AS tapartner_of, both._2id AS tbid, both._2name AS tbname, both._2partner_of AS tbpartner_of FROM (SELECT a.id AS _1id, a.name AS _1name, a.partner_of AS _1partner_of, b.id AS _2id, b.name AS _2name, b.partner_of AS _2partner_of FROM person a INNER JOIN person b ON a.id = b.partner_of) AS both) AS x

(*) Also notice both properties have renameable = ByStrategy, this means that a querySchema has been defined for neither of them.

  • Unit test all changes
  • Update README.md if applicable
  • Add [WIP] to the pull request title if it's work in progress
  • Squash commits that aren't meaningful changes
  • Run sbt scalariformFormat test:scalariformFormat to make sure that the source files are formatted

@getquill/maintainers

@deusaquilus deusaquilus changed the title Fixing Embedded Coproduct Column Duplication Issue [WIP] Fixing Embedded Coproduct Column Duplication Issue Sep 11, 2019
@deusaquilus deusaquilus force-pushed the invisible_properties_for_embedded_entities branch 5 times, most recently from 3a64a02 to ebeb464 Compare September 12, 2019 01:17
@deusaquilus deusaquilus force-pushed the invisible_properties_for_embedded_entities branch from ebeb464 to 86f082c Compare September 12, 2019 19:04
@deusaquilus deusaquilus changed the title [WIP] Fixing Embedded Coproduct Column Duplication Issue Fixing Embedded Coproduct Column Duplication Issue Sep 12, 2019
@deusaquilus deusaquilus merged commit 21c30d7 into master Sep 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using Embedded with Distinct can cause Incorrect and Duplicate Column Aliases
1 participant