Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Note on jsonFile having separate JSON objects per line #3517

Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,7 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
Spark SQL can automatically infer the schema of a JSON dataset and load it as a SchemaRDD.
This conversion can be done using one of two methods in a SQLContext:

* `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
* `jsonFile` - loads data from a directory of text files where each line of the files is a JSON object.
* `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.

{% highlight scala %}
Expand All @@ -621,7 +621,7 @@ val sqlContext = new org.apache.spark.sql.SQLContext(sc)

// A JSON dataset is pointed to by path.
// The path can be either a single text file or a directory storing text files.
val path = "examples/src/main/resources/people.json"
val path = "examples/src/main/resources/people.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to move the file too and update the other places that reference it:

examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQL.java:    String path = "examples/src/main/resources/people.json";
examples/src/main/python/sql.py:    path = os.path.join(os.environ['SPARK_HOME'], "examples/src/main/resources/people.json")

// Create a SchemaRDD from the file(s) pointed to by path
val people = sqlContext.jsonFile(path)

Expand Down Expand Up @@ -650,7 +650,7 @@ val anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD)
Spark SQL can automatically infer the schema of a JSON dataset and load it as a JavaSchemaRDD.
This conversion can be done using one of two methods in a JavaSQLContext :

* `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
* `jsonFile` - loads data from a directory of text files where each line of the files is a JSON object.
* `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.

{% highlight java %}
Expand All @@ -659,7 +659,7 @@ JavaSQLContext sqlContext = new org.apache.spark.sql.api.java.JavaSQLContext(sc)

// A JSON dataset is pointed to by path.
// The path can be either a single text file or a directory storing text files.
String path = "examples/src/main/resources/people.json";
String path = "examples/src/main/resources/people.txt";
// Create a JavaSchemaRDD from the file(s) pointed to by path
JavaSchemaRDD people = sqlContext.jsonFile(path);

Expand Down Expand Up @@ -688,7 +688,7 @@ JavaSchemaRDD anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD);
Spark SQL can automatically infer the schema of a JSON dataset and load it as a SchemaRDD.
This conversion can be done using one of two methods in a SQLContext:

* `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
* `jsonFile` - loads data from a directory of text files where each line of the files is a JSON object.
* `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.

{% highlight python %}
Expand All @@ -698,7 +698,7 @@ sqlContext = SQLContext(sc)

# A JSON dataset is pointed to by path.
# The path can be either a single text file or a directory storing text files.
path = "examples/src/main/resources/people.json"
path = "examples/src/main/resources/people.txt"
# Create a SchemaRDD from the file(s) pointed to by path
people = sqlContext.jsonFile(path)

Expand Down