Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quat Enhancements to Support Needed Spark Use Cases #2010

Merged
merged 3 commits into from
Oct 19, 2020

Conversation

deusaquilus
Copy link
Collaborator

@deusaquilus deusaquilus commented Oct 16, 2020

Spark has a use case that allows Quats on abstract types even for static queries. Here is a pattern in which I have discovered this possibility:

  // import the quill spark context
  import QuillSparkContext._

  // create a trait that defines types and then returns the based on a query
    sealed trait ChildJoiner {
    type SomeChild <: { def id: Int }
    type SomeGrandChild <: { def parentId: Int }
    def children: Dataset[SomeChild]
    def grandchildren: Dataset[SomeGrandChild]

    implicit class JoinFromParent[T <: { def childId: Int }](p: T) {
      def joinChild = quote {
        for {
          c <- liftQuery(children).join(c => c.id == p.childId)
        } yield c
      }
      def joinChildAndGrandChild = quote { //hellooooooo
        for {
          c <- liftQuery(children).join(c => c.id == p.childId)
          g <- liftQuery(grandchildren).join(g => g.parentId == c.id)
        } yield (c, g)
      }
    }

  // Create types representing concrete impelementations
  // (actually this is spark so it needs to be done in a top-level area)
  case class Parent(name: String, childId: Int)
  case class Child(name: String, id: Int)

  //create a set of types that represent the concrete implementations
  object Data {
    val parent = Parent("Joe", 1)
    val child = Child("Jack", 1)
  }
  val parents = List(Data.parent).toDS
  val childrenBase = List(Data.child).toDS

  // instantiate the data
  class Extensions extends ChildJoiner {
    override type SomeChild = Child
    override val children: Dataset[Child] = childrenBase
  }

Now attempt the following query:

        // val q: Quoted[Query[SomeChild]] = ...
        val q = quote {
          for {
            p <- liftQuery(parents)
            c <- p.joinChild
          } yield c
        }

Firstly, this will not be expanded correctly since the ApplyMap phase incorrectly removes the Map element around the FlatJoin of this query. Once that minor issues is fixed however, there is a larger one. Namely that outer nested field of the c member 'child' only knows about the id field.

[info]   | SELECT
[info]   |   c.id
[info]   | FROM
[info]   |   (?) AS p
[info]   |   INNER JOIN (?) AS c ON c.id = p.childId

This is problematic since the Child object needs both a name and id field for spark to be able to encode the type back (notice how the actual return type is Quoted[Query[SomeChild]]).

The above behavior happens because when the quats of this query are introspected, they infer the quat from the SomeChild type as opposed to the Child type. This can be surmised based on the Quat of the query being create:

  Map(
    FlatMap(
      infix"${?}",
      Id("p", CC(name:V,childId:V)), // Parent
      Map(
        Map(
          FlatJoin(
            InnerJoin,
            infix"${?}",
            Id("c", CC(id:V)),  // Child
            BinaryOperation(Property(Id("c", CC(id:V)) /* Child */, "id"), ==, Property(Id("p", CC(name:V,childId:V)) /* Parent */, "childId"))
          ),
          Id("c", CC(id:V)), // Child
          Id("c", CC(id:V))  // Child
        ),
        Id("c", CC(name:V,id:V)),  // Child (both fields are known only here since this is synthesized in the 'run' function)
        Id("c", CC(name:V,id:V))   // Child (both fields are known only here since this is synthesized in the 'run' function)
      )
    ),
    Id("x", CC(name:V,id:V)), // Same here
    Id("x", CC(name:V,id:V))  // Same here
  )

In order to fix this issue, we need to identify which child quats are abstract and which are not. The Quat.Product.Type.Concreate and Quat.Product.Type.Abstract fields have been created for this purpose. Using this logic, the AST has been changed to this (i.e. CCA represents an abstract case class).

  Map(
    FlatMap(
      infix"${?}",
      Id("p", CC(name:V,childId:V)), // Parent
      Map(
        Map(
          FlatJoin(
            InnerJoin,
            infix"${?}",
            Id("c", CCA(id:V)),  // Child
            BinaryOperation(Property(Id("c", CCA(id:V)) /* Child */, "id"), ==, Property(Id("p", CC(name:V,childId:V)) /* Parent */, "childId"))
          ),
          Id("c", CCA(id:V)), // Child
          Id("c", CCA(id:V))  // Child
        ),
        Id("c", CCA(name:V,id:V)),  // Child (technically should be concrete but we do not know for sure since it's parsed from a type-member i.e. SomeChild when it is set to Child)
        Id("c", CCA(name:V,id:V))   // Child (same here)
      )
    ),
    Id("x", CCA(name:V,id:V)), // Child (same here)
    Id("x", CCA(name:V,id:V))  // Child (same here)
  )

Furthermore, we a check in the SimpleNestedExpansion to not expand identifiers whose type is abstract, even they are the only field in a selection.
Also, since these kinds of idents (i.e. with a abstract Quat.Product) now need to be expanded with a star in both a struct (e.g. Ident(a) => struct(a.*)) as well as just start selects if the are on the top level of a FlattenSqlQuery (e.g. Ident(a) => a.*), modifications need to be made of the sqlQueryTokenizer in SparkDialect.

Finally, it is important to note that since quats can now be abstract, it is perfectly possible to select a property from a quat that does not actually exist. For this reason, we have changed the Property AST element to not throw an exception when a field is looked up that does not exist on the quat inside. Instead it returns Quat.Unknown. This type of Quat is also useful to identify situations where inferQuat has received a field is does not know anything about.

@deusaquilus deusaquilus force-pushed the abstract-member branch 3 times, most recently from c993826 to c5299b5 Compare October 16, 2020 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant