Express Infix Clauses not in Select List #1597

deusaquilus · 2019-09-08T03:24:43Z

Fixes (first issue) #1583, (second issue) #1598, (third issue) #1564
Also fixes #1580

Problem

First Issue

As explained in #1583, select elements that are infix clauses cannot be simply removed from a query when the are not being selected (at least if they are not pure). For example:

val q = quote {
  query[Person].map(p => (infix"DISTINCT ON (${p.other})".as[Int], p.name, p.id)).map(t => (t._2, t._3))
}
// Normally Produces:
// SELECT p.name, p.id FROM person p
// ... this totally breaks the intent of the query

This typically occurs where multiple map clauses are combined into a single one. Since clearly this cannot be done for the aforementioned infix clauses, we need to express the chained map clauses as sub-super queries which Quill already does, however, when deciding which select-values should be in the sub-query Quill excludes the values of elements that are not in the outer select list. This is perfectly acceptable in normal cases e.g:

run {
  query[Person].map(p => (p.id, p.name, p.other)).nested.map(t => (t._1, t._2))
}
// SELECT p.id, p.name FROM (SELECT x.id, x.name FROM person x) AS p
// The 'p.other' property is excluded, this is perfectly reasonable and actually is an optimization.
// (SIDE NOTE: Although technically the SQL optimizer should do it for us... it doesn't always do it, especially when this column is actually coming from a view, from a view, from a view etc...)

However not in situations where the are infix clauses being used as explained above.
For this reason, we need to find which infixes of every sub-query have not been expressed in the expandSelect (now refactored into ExpandSelect) method and then put them back into the sub-query. This is tricky because these infixes could be inside of tuples or case classes whose sibling elements have been selected so blindly including all SelectValue objects with infixes will cause duplicate fields to be selected. For this reason, we need to recursively traverse all SelectValue objects containing case classes and tuples and extract the infix values inside.

During the traversal, we need to keep track of which element inside the respective case class or tuple we are in and check if that element has already been expressed as a SelectValue because it matched some Property of the output. Since arbitrary things can be selected inside of arbitrary paths inside case-classes and tuples, we need to keep track of not only the order of elements in the initial selection (of the sub-query) but also which element of the respective tuple (or sub-tuple, or sub-sub tuple) the infix is in. For this reason, we created the OrderedSelect object which keeps a List[Int] that represents the "path" of an element inside a list of select values. For example, say you have a query like this:

case class Person(id:Int, name:String, age:String)
val q = quote {
  query[Person]
  .map(p => (p.id, (p.name, infix"RANK() OVER(ORDER BY ${p.age})")))
  .map(t => (t._1, t._2._1))
}

Once the SqlQuery(ast) has been created, it looks something like this:

Map(
  Map(
    Entity("Person", List()),
    Ident("p"),
    Tuple(List(Property(Ident("p"), "id"), Tuple(List(Property(Ident("p"), "name"), Infix(List("RANK() OVER(ORDER BY ", ")"), List(Property(Ident("p"), "age")), false)))))
  ),
  Ident("t"),
  Tuple(List(Property(Ident("t"), "_1"), Property(Property(Ident("t"), "_2"), "_1")))
)

The "orders" of the elements are the following:
Order = 1 - Property(Ident("p"), "id")
Order = 2 - Tuple(List(Property(Ident("p"), "name"), Infix(List("RANK() OVER(ORDER BY ", ")"), List(Property(Ident("p"), "age")), false)
Order = 2,1 - Property(Ident("p"), "name")
Order = 2,2 - Infix(List("RANK() OVER(ORDER BY ", ")")

Now, since we have already selected element 2,2 (i.e. Property(Ident("p"), "name")) we cannot just select element 2 (i.e. the entire tuple since that would cause the name property to be selected twice (i.e. SELECT ... FROM (SELECT p.id, p.name, p.name, RANK() ...)). For this reason, we need to search down into element 2 into 2,2 and then pull out the infix. Then we need to put this infix into the correct place in the resulting query.

Second Issue

A related issue #1583 is where tuples are mapped to ad-hoc case classes and then are part of a nested select. This typically breaks because when in certain situations, multiple nested clauses exist in the AST (i.e. Nested(Nested(q)) and sub-query fields cannot be attached to the corresponding elements because of how the nested query expansion works.

case class TestEntity(s: String, i: Int, l: Long, o: Option[Int]) extends Embedded
case class Dual(ta: TestEntity, tb: TestEntity)

val qr1 = quote {
  query[TestEntity]
}

val q = quote {
  qr1.join(qr1).on((a, b) => a.i == b.i).nested.map(both => both match { case (a, b) => Dual(a, b) }).nested
}

println(run(q).string)

This occurs:

cmd21.sc:1: exception during macro expansion: 
java.lang.IndexOutOfBoundsException: 1
	at scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
	at scala.collection.immutable.List.apply(List.scala:84)
	at io.getquill.context.sql.norm.ExpandNestedQueries$.io$getquill$context$sql$norm$ExpandNestedQueries$$expandReference$1(ExpandNestedQueries.scala:90)
	at io.getquill.context.sql.norm.ExpandNestedQueries$.io$getquill$context$sql$norm$ExpandNestedQueries$$expandReference$1(ExpandNestedQueries.scala:80)
	at io.getquill.context.sql.norm.ExpandNestedQueries$$anonfun$expandSelect$1.apply(ExpandNestedQueries.scala:108)
	at io.getquill.context.sql.norm.ExpandNestedQueries$$anonfun$expandSelect$1.apply(ExpandNestedQueries.scala:108)
	at scala.collection.immutable.List.map(List.scala:288)

This is due to the fact that there are multiple Nested clauses in the AST:

// show(SqlNormalize(q.ast)) 
Map(
  Nested(
    Nested(
      Map(
        Join(
          InnerJoin,
          Entity("TestEntity", List()),
          Entity("TestEntity", List()),
          Ident("a"),
          Ident("b"),
          BinaryOperation(Property(Ident("a"), "i"), ==, Property(Ident("b"), "i"))
        ),
        Ident("ab"),
        Tuple(List(Ident("a"), Ident("b")))
      )
    )
  ),
  Ident("both"),
  CaseClass(List(("ta", Property(Ident("both"), "_1")), ("tb", Property(Ident("both"), "_2"))))
)

This leads to an AST that looks like this:

// show(SqlQuery(SqlNormalize(q.ast))) 
FlattenSqlQuery(
  List(
    QueryContext(
      FlattenSqlQuery(
        List(
          QueryContext(
            FlattenSqlQuery(
              List(
                JoinContext(
                  InnerJoin,
                  TableContext(Entity("TestEntity", List()), "a"),
                  TableContext(Entity("TestEntity", List()), "b"),
                  BinaryOperation(Property(Ident("a"), "i"), ==, Property(Ident("b"), "i"))
                )
              ),
              None,
              None,
              List(),
              None,
              None,
              List(SelectValue(Ident("a"), None, false), SelectValue(Ident("b"), None, false)),
              false
            ),
            "x"
          )
        ),
        None,
        None,
        List(),
        None,
        None,
        List(SelectValue(Ident("x"), None, false)), // (b) From here?
        false
      ),
      "both"
    )
  ),
  None,
  None,
  List(),
  None,
  None,
  List(
    SelectValue(
      // (a) We need to look up _1 and _2......
      CaseClass(List(("ta", Property(Ident("both"), "_1")), ("tb", Property(Ident("both"), "_2")))),
      None,
      false
    )
  ),
  false
)

Notice that we need to lookup the properties _1 and _2 from the x variable in the middle? This will obviously fail because x does not contain these values. The problem here is that the extra nested clause produces and incorrect expression between the _1 and _2 keys and the a and b select values to which they refer. Collapsing the Nested(Nested(q)) inside of SqlQuery solves this problem.

Third Issue

The third issue involves using an embedded entity inside of a query using distinct. In addition to potentially having a double nesting issue (Second Issue), this kind of query fails in the ValidateSqlQuery step because it's elements are not properly expanded. This occurs because root-level tuples in map-clauses and root-level ad-hoc case classes are not treated equivalently. Take for instance the following nested query that maps to a tuple:

case class Emb(a: Int, b: Int)
val q = quote {
  query[Emb].map(e => (1, e)).distinct 
}
run(q)
// SELECT e._1, e._2a, e._2b FROM (SELECT DISTINCT 1 AS _1, e.a AS _2a, e.b AS _2b FROM emb e) AS e

The SqlQuery gets expanded to the following:

// SqlQuery(SqlNormalize(q.ast)) 
FlattenSqlQuery(
  List(TableContext(Entity("Emb", List()), "e")),
  None,
  None,
  List(),
  None,
  None,
  List(SelectValue(Constant(1), None, false), SelectValue(Ident("e"), None, false)),
  true
)

Then take the following nested query that maps to an ad-hoc case class:

case class Parent(id: Int, emb1: Emb)
case class Emb(a: Int, b: Int) extends Embedded
val q = quote { 
  query[Emb].map(e => Parent(1, e)).distinct
}
run(q)
// java.lang.IllegalStateException: The monad composition can't be expressed using applicative joins. Faulty expression: '(1, e)'. Free variables: 'List(e)'.
// 	at io.getquill.util.Messages$.fail(Messages.scala:21)
// 	at io.getquill.context.sql.idiom.SqlIdiom$$anonfun$3.apply(SqlIdiom.scala:43)
// 	at io.getquill.context.sql.idiom.SqlIdiom$$anonfun$3.apply(SqlIdiom.scala:43)
// 	at scala.Option.map(Option.scala:146)

The SqlQuery gets expanded to the following:

// SqlQuery(SqlNormalize(q.ast)) 
FlattenSqlQuery(
  List(TableContext(Entity("Emb", List()), "e")),
  None,
  None,
  List(),
  None,
  None,
  List(SelectValue(CaseClass(List(("id", Constant(1)), ("emb1", Ident("e")))), None, false)),
  true
)

Notice that in the former, the tuple has been flattened to an array of SelectValue elements as opposed to the latter has not.

The difference in behavior also has an impact on ExpandNestedQueries. Notice for instance that tuple indices are used to de-reference the Nth element of a given select:

        case pp @ Property(_, TupleIndex(idx)) =>
            select(idx) match {
              case OrderedSelect(o, SelectValue(ast, alias, c)) =>
                OrderedSelect(o, SelectValue(ast, concat(alias, idx), c))
            }

The reason why Quill behaves differently for Tuples and Ad-Hoc case classes is due to the treason that tuples are used as both a row-coproduct type i.e. most notably from applicative joins, as well as an element type from a standard map method. Due to having this double-meaning, Quill automatically expands element-type types inside of SqlQuery so that they behave the same was as coproduct-type tuples. Now when using Ad-Hoc case classes as coproduct-types (i.e. with the use of Embedded), some additional effort needs to be taken in order to expand them properly prior to verification in VerifySqlQuery. This allows VerifySqlQuery to properly exclude sub-element identities (i.e. identities inside of SelectValue(CaseClass(...)) elements).

Potential Issues in Future

Are there situations where SELECT ... FROM (SELECT C1, .... RANK() (...)) queries will change their results by the mere exclusion of a column that C1 that is excluded from the outer select. If this is the case, there should exist a mode that dissallows ExpandSelect from excluding ANY columns from sub-queries. This is fairly straightfoward to do now since we are basically doing this for infixes, whereas instead of just infixes, we would do it for all kinds of columns.

Trace

Due to the complex nature of ExpandNestedQueries and the AST transformations in general, I have decided to add some tracing code that can give the user more insight into what is going on with these operations. This has been instrumented as a Interpolator so as to clearly distinguish itself from the surrounding code. Since these things are considered a "side effect" and are to be avoided in functional code (at least by some schools of thought), I have introduced various methods such as andReturn and andContinune to the trace Interpolator that should allow the user to keep code in the functional style, at least in some places.

Conclusion

Since map-chained infixes are the most typical cause of nested queries, and distincts are a close-second, all three of these issues are closely related and require the same set of functionality in ExpandNestedQueries and VerifySqlQuery in order to function properly. Therefore I have chosen to bundle them into a single PR.

Checklist

Unit test all changes
Update README.md if applicable
Add [WIP] to the pull request title if it's work in progress
Squash commits that aren't meaningful changes
Run sbt scalariformFormat test:scalariformFormat to make sure that the source files are formatted

@getquill/maintainers

deusaquilus force-pushed the fix_missing_infixes branch 8 times, most recently from da90a2f to 86f9965 Compare September 10, 2019 00:06

deusaquilus changed the title ~~[WIP] Express Infix Clauses not in Select List~~ Express Infix Clauses not in Select List Sep 10, 2019

deusaquilus force-pushed the fix_missing_infixes branch from 86f9965 to 29261f0 Compare September 10, 2019 14:59

deusaquilus mentioned this pull request Sep 10, 2019

Using Embedded with Distinct can cause Incorrect and Duplicate Column Aliases #1602

Closed

deusaquilus added 2 commits September 10, 2019 17:26

Add compileQuick plugin

0c12491

Infix should be in query sub-clauses even if contents not selected.

c085e1c

deusaquilus force-pushed the fix_missing_infixes branch from 29261f0 to c085e1c Compare September 10, 2019 21:27

deusaquilus merged commit 24a68c8 into master Sep 11, 2019

This was referenced Sep 11, 2019

Nested Queries with Embedded in Ad-Hoc Case Class Crashes #1598

Closed

Impure Infix Clauses are removed via column pruning #1583

Closed

Distinct with Embedded Entity crashes #1564

Closed

Nested Query with Multi-Element Tuple Crashes #1580

Closed

deusaquilus deleted the fix_missing_infixes branch September 11, 2019 00:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Express Infix Clauses not in Select List #1597

Express Infix Clauses not in Select List #1597

deusaquilus commented Sep 8, 2019 •

edited

Loading

Express Infix Clauses not in Select List #1597

Express Infix Clauses not in Select List #1597

Conversation

deusaquilus commented Sep 8, 2019 • edited Loading

Problem

First Issue

Second Issue

Third Issue

Potential Issues in Future

Trace

Conclusion

Checklist

deusaquilus commented Sep 8, 2019 •

edited

Loading