-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12069][SQL] Update documentation with Datasets #10060
Conversation
@Experimental | ||
@implicitNotFound("Unable to find encoder for type stored in a Dataset. Primitive types " + | ||
"(Int, String, etc) and Products (case classes) and primitive types are supported by " + | ||
"importing sqlContext.implicits._ Support for serializing other types will be added in future " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marmbrus Primitive types mentioned twice ? Is it ok ?
Test build #46946 has finished for PR 10060 at commit
|
* Encoders are not intended to be thread-safe and thus they are allow to avoid internal locking | ||
* and reuse internal buffers to improve performance. | ||
* == Scala == | ||
* Encoders are generally created automatically though implicits from a `SQLContext`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be mistaken but I think you meant to write "through" and not "though".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be great to expand this slightly and explain what can be inferred automatically right now.
@@ -19,6 +19,9 @@ package org.apache.spark.sql | |||
|
|||
import java.lang.reflect.Modifier | |||
|
|||
import org.apache.spark.annotation.Experimental |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import order
Test build #47151 has finished for PR 10060 at commit
|
## Datasets | ||
|
||
A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of | ||
RDDs (strong typing, ability to use powerful lambda functions) with the benifits of Spark SQL's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
benifits -> benefits
Test build #47356 has finished for PR 10060 at commit
|
@@ -9,18 +9,51 @@ title: Spark SQL and DataFrames | |||
|
|||
# Overview | |||
|
|||
Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as distributed SQL query engine. | |||
Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided | |||
by Spark SQL provide Spark with more about the structure of both the data and the computation being performed. Internally, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a word missing between "more" and "about" like information?
I made a few comments, but otherwise it's clear. |
Thanks for the comments! |
Test build #47366 has finished for PR 10060 at commit
|
No description provided.