Spark sql support #941

fwbrasil · 2017-10-21T17:43:42Z

Fixes #93

Problem

Quill could be used as a more user-friendly way to use Spark's SQL engine. The Dataset API is unintuitive and has many untyped methods.

Solution

Create a new module that allows users to define queries using Quill's DSL to run on top of Datasets and RDDs.

Notes

Spark doesn't have a Scala 2.12 version yet. I upgraded to SBT 1.0 as a tentative to fix the problems with sbt-doge. It didn't really solve the issue. I reverted the SBT 1.0 upgrade and used a hack using a system property.
Spark doesn't support bind variables, so Quill has to encode values as strings. Also, not all types supported by SqlContext are supported by Spark. That's why the context extends directly from Context.

Checklist

Unit test all changes
Update README.md if applicable
Add [WIP] to the pull request title if it's work in progress
Squash commits that aren't meaningful changes
Run sbt scalariformFormat test:scalariformFormat to make sure that the source files are formatted

@getquill/maintainers

fwbrasil · 2017-10-25T03:37:40Z

@getquill/maintainers this is ready for review

fwbrasil · 2017-10-25T06:15:07Z

If someone is interested in testing this new module, just change the Quill version to 2.1.0. Note that spark doesn't have a scala 2.12 release yet, so the project needs to be on 2.11.

The documentation is here and here is an example of a spark job.

mosyp

great work!

mosyp · 2017-10-25T08:15:42Z

quill-spark/src/main/scala/io/getquill/QuillSparkContext.scala

+  def probe(statement: String): Try[_] = Success(Unit)
+
+  val idiom = SparkDialect
+  val naming = Literal


Does it mean that using different naming strategies is not possible?

Yes, given that the new module uses spark encoders and they don't support naming strategies, only Literal can be used.

mosyp · 2017-10-25T08:22:41Z

quill-core/src/main/scala/io/getquill/norm/capture/Dealias.scala

-        val ((an, iAn, on), ont) = dealias(a, iA, o)((_, _, _))
-        val ((bn, iBn, onn), _) = ont.dealias(b, iB, on)((_, _, _))
+        val ((an, iAn, on), _) = dealias(a, iA, o)((_, _, _))
+        val ((bn, iBn, onn), _) = dealias(b, iB, on)((_, _, _))


Could you give more context on these changes? I was debugging #939 and found that the root issue comes from this class. (this comment does not relate to spark module, so we could continue this discussion somewhere else)

This was another bug actually. It was using the same alias for the case where the same entity is on both sides of the join (qr1).join(qr1). Each branch of the join should be treated as independent, so it doesn't make sense to use the transformer of the first one.

mosyp · 2017-10-25T08:27:32Z

quill-spark/src/test/scala/io/getquill/context/spark/GithubExample.scala

+  org:        User
+)
+
+object GithubExample extends App {


I think that we should create SPARK.md which would show all advantages of quill against raw spark.
And add link to this file in readme, since imo, existing example in readme does not show "Tired of the #scala #spark untyped madness?" :)

Good point. I'm actually planning to write a blog post based on this example. I think I'll wait for a release with the new module first, though.

mosyp · 2017-10-25T08:30:22Z

quill-core/src/main/scala/io/getquill/norm/AttachToEntity.scala

+      q match {
+        case q: Entity => Some(q)
+        case q: Infix  => Some(q)
+        case _         => None


How this could be related to other quill modules? I suppose these changes required to generate queries like this:
SELECT x1.age _1 FROM (?) x1 WHERE x1.name = ?. However does it impact other world somehow? e.g. adds new opportunities?

I'd say this is a bug fix. Users should be able to use an infix in an entity position.

mosyp

LGTM

adelbertc · 2017-10-25T18:19:42Z

Hey man, this is really cool, glad to see more people taking type safety in Spark seriously! I was wondering if you've seen the Frameless project? We started with providing a type safe query for Spark SQL and have since started supporting other functionality as well.

Great work!

fwbrasil · 2017-10-25T19:01:29Z

@adelbertc Thanks for the feedback! :) It's indeed great to see type-safe solutions like Quill and Frameless for spark sql. I imagine Dataset is one the most used scala DSLs, which is terrible news given how unintuitive it is.

I know Frameless, but I'm not a big fan of the Dataset API in general (select, where, etc). I think it's much more natural if the user can express the computation using a collection-like API.

fwbrasil changed the title ~~Spark sql support~~ [WIP] Spark sql support Oct 22, 2017

fwbrasil force-pushed the spark branch 9 times, most recently from 1da8512 to 63fae7d Compare October 25, 2017 03:36

fwbrasil force-pushed the spark branch from 63fae7d to 1c4b507 Compare October 25, 2017 03:41

fwbrasil changed the title ~~[WIP] Spark sql support~~ Spark sql support Oct 25, 2017

fwbrasil force-pushed the spark branch from 1c4b507 to 20a4603 Compare October 25, 2017 06:35

mosyp reviewed Oct 25, 2017

View reviewed changes

spark sql support

76f1098

fwbrasil force-pushed the spark branch from 20a4603 to 76f1098 Compare October 25, 2017 18:05

mosyp approved these changes Oct 25, 2017

View reviewed changes

fwbrasil merged commit a4c8432 into master Oct 25, 2017

fwbrasil deleted the spark branch October 25, 2017 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark sql support #941

Spark sql support #941

fwbrasil commented Oct 21, 2017 •

edited

Loading

fwbrasil commented Oct 25, 2017

fwbrasil commented Oct 25, 2017 •

edited

Loading

mosyp left a comment

mosyp Oct 25, 2017

fwbrasil Oct 25, 2017

mosyp Oct 25, 2017 •

edited

Loading

fwbrasil Oct 25, 2017

mosyp Oct 25, 2017

fwbrasil Oct 25, 2017

mosyp Oct 25, 2017

fwbrasil Oct 25, 2017

mosyp left a comment

adelbertc commented Oct 25, 2017

fwbrasil commented Oct 25, 2017

Spark sql support #941

Spark sql support #941

Conversation

fwbrasil commented Oct 21, 2017 • edited Loading

Problem

Solution

Notes

Checklist

fwbrasil commented Oct 25, 2017

fwbrasil commented Oct 25, 2017 • edited Loading

mosyp left a comment

Choose a reason for hiding this comment

mosyp Oct 25, 2017

Choose a reason for hiding this comment

fwbrasil Oct 25, 2017

Choose a reason for hiding this comment

mosyp Oct 25, 2017 • edited Loading

Choose a reason for hiding this comment

fwbrasil Oct 25, 2017

Choose a reason for hiding this comment

mosyp Oct 25, 2017

Choose a reason for hiding this comment

fwbrasil Oct 25, 2017

Choose a reason for hiding this comment

mosyp Oct 25, 2017

Choose a reason for hiding this comment

fwbrasil Oct 25, 2017

Choose a reason for hiding this comment

mosyp left a comment

Choose a reason for hiding this comment

adelbertc commented Oct 25, 2017

fwbrasil commented Oct 25, 2017

fwbrasil commented Oct 21, 2017 •

edited

Loading

fwbrasil commented Oct 25, 2017 •

edited

Loading

mosyp Oct 25, 2017 •

edited

Loading