Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Subquery Expansion to be Quat-based #1920

Merged
merged 1 commit into from
Jul 31, 2020

Conversation

deusaquilus
Copy link
Collaborator

@deusaquilus deusaquilus commented Jul 29, 2020

version: 3.5.2
package: quill-sql
database: all

Nested Query Expansion is probably the hardest thing in Quill since it is what ties complex Queries from their expression in Quill's Scala-esque AST into the SqlQuery-object based tree for large and difficult nested AST structures. Historically, this mechanism has used an approach where it looks at what fields occur in outer queries to figure out what fields to add to inner aliases that have not been expanded.

For example this:

SELECT x.foo, x.bar FROM      -- we have 'foo' and 'bar' on x here
  (SELECT t from table as x)  -- so add them here!

Becomes this:

SELECT x.foo, x.bar FROM
  (SELECT t.foo, t.bar from table as x)

For simple Queries, this task is fairly straightforward however for complex queries that use embedded entities, it becomes intractable as will explain soon. However, first, note that since we have introduced Quats in #1911 this kind of process is completely unnecessary. Take the previous example:

SELECT x.foo, x.bar FROM
  (SELECT t from table as x)  -- 't' is an Ident which will have a Quat: CC(foo,bar) we we know to add those properties!

Since all Idents have Quats now and t is an ident, we can simply pull out the needed fields from the Quat. I call this process Quat-Protraction.

Before going further, here are some terms that I am going to use.

Some Terms that I will use in this PR

  • strategize - apply a naming strategy to a property or identity
  • propertize - expand an ident with properties used in sub-queries as has been the previous functionality in ExpandNestedProperties

Due to the difficult pre-Quats nature of how subqueries are expanded, it has always been difficult
to correctly strategize nested properties because we would have to strategize their aliases first... but we didn't know whether a given alias was strategized beforehand, because it may or may not have been already used in a nested strategization from an upper level select.

Here's a simple example

SELECT
  DISTINCT p._1,
  p._2mid AS _2,
  p._2simtheSid /*<- (2) We also need to apply it here here*/ AS _3theSid
FROM
  (
    SELECT
      DISTINCT g.theBid AS _1,
      g.mid AS _2mid,
      g.theSid AS _2simtheSid /*<- (1) if we apply a naming strategy here*/
    FROM
      theBim g
  ) AS p

Yet...

SELECT
  DISTINCT tup._1,
  tup._2,
  tup._3theSid /* (2) before we've gone over here */ AS _3
FROM
  (
    SELECT
      DISTINCT p._1,
      p._2mid AS _2,
      p._2simtheSid /* (1) but we don't know that it exists here */ AS _3theSid
    FROM
      (
        SELECT
          DISTINCT g.theBid AS _1,
          g.mid AS _2mid,
          g.theSid AS _2simtheSid
        FROM
          theBim g
      ) AS p
  ) AS tup

These kinds of dichotomies have created large headaches leading to issues like this one #1628 where we made the decision to just collapse nesting layers through which we could not reasonably traverse. This of course did not help us in situations where distinct was used (since nested queries with 'distinct' cannot simply be collapsed) which led to buggy behavior which we attempted to address by making ExpandNestedQueries even more complex.

@deusaquilus
Copy link
Collaborator Author

Speaking of collapsing queries, one behavior we introduced was that multiple .nested blocks were collapsed
into a single one effectively disallowing the user to control the amount of nesting layers in the query.

For example, say that I ask a query to do multiple nesting like this:

val q = quote {
  query[Parent].map(p => p.emb).nested
    .map(e => (e.name, e.id)).nested
      .map(tup => (tup._1, tup._2)).nested
}

The expected result is something like this:

SELECT p.embtheName, p.embid FROM (
  SELECT x.embid, x.embtheName FROM (
    SELECT x.embid, x.embtheName FROM (
      SELECT x.id AS embid, x.theName AS embtheName FROM Parent x
      ) AS x
    ) AS x
  ) AS p

Whereas due to a forcing of multi-nesting collapse, the query that gets produced is something like this:

SELECT p.embtheName, p.embid FROM (SELECT x.theName AS embtheName, x.id AS embid FROM Parent x) AS p

This is simpler but probably not in line with a user's expectations.

This was introduced to thwart the previously mentioned issue but the behavior is technically a bug. Now since
the issue above no longer occurs, we can correctly allow query nesting.

@deusaquilus
Copy link
Collaborator Author

One additional thing that I decided to do is to separate out the complexity of ExpandNestedQueries into several components.
The phase ExpandNestedQueries just expands out the Quats in a process I call Quat-Protraction. The SelectPropertyProtractor
basically does what ExpandSelect used to do but now using Quats instead of outer properties. That is to say, it adds sub-fields to aliases of nested queries.
Taking an SqlQuery object that looks like this:

  FlattenSqlQuery(
    select = 
      List(
        QueryContext(
          FlattenSqlQuery(
            select = List(TableContext(Entity("Parent", List(), CC(id,emb:CC(name,id))), "p")),
            from = List(SelectValue(Property(Id("p", CC(id,emb:CC(name,id))), "emb"), None, false)),
          ),"e")
      ),
    select = List(SelectValue(Tuple(List(Property(Id("e", CC(name,id)), "name"), Property(Id("e", CC(name,id)), "id"))), None, false)),
  )

... and turning it into something more like this:

  FlattenSqlQuery(
    select = List(
      QueryContext(
        FlattenSqlQuery(
          select = List(TableContext(Entity("Parent", List(), CC(id,emb:CC(name,id))), "p")),
          from = List(
            SelectValue(Property(Id("p", CC(id,emb:CC(name,id))), "name"), Some("name"), false),
            SelectValue(Property(Id("p", CC(id,emb:CC(name,id))), "id"), Some("id"), false)
          ),
          true
        ),"e")),
    select = List(SelectValue(Property(Id("e", CC(name,id)), "name"), Some("_1"), false), SelectValue(Property(Id("e", CC(name,id)), "id"), Some("_2"), false))
  )

@deusaquilus
Copy link
Collaborator Author

Another issue improved as a result of this commit is that now aliases of tables and columns are tokenized much more consistently. For example a context with UpperCase naming would typically cause table and column aliases to be capitalized in many situations. This is now no longer the case.

For example a query like this with a UpperCase naming:

val q = quote {
  query[Person].map {
    p => (p, infix"foobar".as[Int])
  }.filter(_._1.id == 1)
}

Would frequently yield SQL like this:

/* Outer Clause */ "SELECT p._1ID, p._1NAME, p._2 FROM (
  /* Inner Clause */ SELECT p.ID AS _1ID, p.NAME AS _1NAME, foobar AS _2 FROM PERSON p
) AS p WHERE p._1ID = 1"

Capitalizing the alises _1ID, _1NAME was of course necessary to do in the query all the way through because in the pre-Quat ExpandNestedQueries, it would be impossible to tell whether a some property Property(p, "_1id") is actually being directly selected from a table (i.e the "Inner Clause") or from a alias that represents a subselect (i.e. the "Outer Clause").

Another issue that occours is that with queries that have multiple outer-clauses and a distinct, since only the select-values of the outer clauses are known, the inner query (which is distinct) only has the outer clause columns. This means that the distinctification is actually wrong! See #1919 for more detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant