[SPARK-12069][SQL] Update documentation with Datasets #10060

marmbrus · 2015-12-01T06:33:32Z

No description provided.

dilipbiswal · 2015-12-01T06:45:21Z

sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala

+@Experimental
+@implicitNotFound("Unable to find encoder for type stored in a Dataset.  Primitive types " +
+  "(Int, String, etc) and Products (case classes) and primitive types are supported by " +
+  "importing sqlContext.implicits._  Support for serializing other types will be added in future " +


@marmbrus Primitive types mentioned twice ? Is it ok ?

SparkQA · 2015-12-01T08:22:13Z

Test build #46946 has finished for PR 10060 at commit 649541c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

BenFradet · 2015-12-01T14:27:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala

- * Encoders are not intended to be thread-safe and thus they are allow to avoid internal locking
- * and reuse internal buffers to improve performance.
+ * == Scala ==
+ * Encoders are generally created automatically though implicits from a `SQLContext`.


I might be mistaken but I think you meant to write "through" and not "though".

It would also be great to expand this slightly and explain what can be inferred automatically right now.

rxin · 2015-12-02T00:48:51Z

sql/catalyst/src/main/scala/org/apache/spark/sql/Encoder.scala

@@ -19,6 +19,9 @@ package org.apache.spark.sql

 import java.lang.reflect.Modifier

+import org.apache.spark.annotation.Experimental


import order

SparkQA · 2015-12-03T19:59:27Z

Test build #47151 has finished for PR 10060 at commit 3e53a4c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2015-12-03T22:42:05Z

docs/sql-programming-guide.md

+## Datasets
+
+A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of
+RDDs (strong typing, ability to use powerful lambda functions) with the benifits of Spark SQL's


benifits -> benefits

SparkQA · 2015-12-08T20:11:10Z

Test build #47356 has finished for PR 10060 at commit 3ff7a46.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

BenFradet · 2015-12-08T20:14:00Z

docs/sql-programming-guide.md

@@ -9,18 +9,51 @@ title: Spark SQL and DataFrames

 # Overview

-Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as distributed SQL query engine.
+Spark SQL is a Spark module for structured data processing.  Unlike the basic Spark RDD API, the interfaces provided
+by Spark SQL provide Spark with more about the structure of both the data and the computation being performed.  Internally,


Is there a word missing between "more" and "about" like information?

BenFradet · 2015-12-08T20:28:35Z

I made a few comments, but otherwise it's clear.

marmbrus · 2015-12-08T22:00:47Z

Thanks for the comments!

SparkQA · 2015-12-08T23:50:10Z

Test build #47366 has finished for PR 10060 at commit 4b51ad7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Author: Michael Armbrust <michael@databricks.com> Closes #10060 from marmbrus/docs. (cherry picked from commit 3959489) Signed-off-by: Michael Armbrust <michael@databricks.com>

docs

649541c

dilipbiswal reviewed Dec 1, 2015
View reviewed changes

BenFradet reviewed Dec 1, 2015
View reviewed changes

marmbrus added 2 commits December 1, 2015 15:57

comments

2aca8f1

Merge remote-tracking branch 'origin/master' into docs

146d41b

rxin reviewed Dec 2, 2015
View reviewed changes

WIP

3e53a4c

gatorsmile reviewed Dec 3, 2015
View reviewed changes

marmbrus added 2 commits December 8, 2015 10:49

Merge remote-tracking branch 'origin/master' into docs

92e566b

address comments

3ff7a46

BenFradet reviewed Dec 8, 2015
View reviewed changes

comments

4b51ad7

marmbrus changed the title ~~[WIP][SPARK-12069][SQL] Update documentation with Datasets~~ [SPARK-12069][SQL] Update documentation with Datasets Dec 8, 2015

asfgit closed this in 3959489 Dec 9, 2015

marmbrus deleted the docs branch March 8, 2016 00:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12069][SQL] Update documentation with Datasets #10060

[SPARK-12069][SQL] Update documentation with Datasets #10060

marmbrus commented Dec 1, 2015

dilipbiswal Dec 1, 2015

SparkQA commented Dec 1, 2015

BenFradet Dec 1, 2015

rxin Dec 2, 2015

rxin Dec 2, 2015

SparkQA commented Dec 3, 2015

gatorsmile Dec 3, 2015

SparkQA commented Dec 8, 2015

BenFradet Dec 8, 2015

BenFradet commented Dec 8, 2015

marmbrus commented Dec 8, 2015

SparkQA commented Dec 8, 2015

		@@ -19,6 +19,9 @@ package org.apache.spark.sql

		import java.lang.reflect.Modifier

		import org.apache.spark.annotation.Experimental

[SPARK-12069][SQL] Update documentation with Datasets #10060

[SPARK-12069][SQL] Update documentation with Datasets #10060

Conversation

marmbrus commented Dec 1, 2015

dilipbiswal Dec 1, 2015

Choose a reason for hiding this comment

SparkQA commented Dec 1, 2015

BenFradet Dec 1, 2015

Choose a reason for hiding this comment

rxin Dec 2, 2015

Choose a reason for hiding this comment

rxin Dec 2, 2015

Choose a reason for hiding this comment

SparkQA commented Dec 3, 2015

gatorsmile Dec 3, 2015

Choose a reason for hiding this comment

SparkQA commented Dec 8, 2015

BenFradet Dec 8, 2015

Choose a reason for hiding this comment

BenFradet commented Dec 8, 2015

marmbrus commented Dec 8, 2015

SparkQA commented Dec 8, 2015