Skip to content

Commit

Permalink
Add a Note on jsonFile having separate JSON objects per line
Browse files Browse the repository at this point in the history
* This commit hopes to avoid the confusion I faced when trying
  to submit a regular, valid multi-line JSON file, also see

  http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html
  • Loading branch information
petervandenabeele committed Nov 30, 2014
1 parent 0fcd24c commit fca7dfb
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -615,6 +615,11 @@ This conversion can be done using one of two methods in a SQLContext:
* `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
* `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.

Note that the file that is offered as _jsonFile_ is not necessarily a valid JSON file,
since each line must contain a separate, valid JSON object. In reverse, a valid
multi-line JSON file will most often fail because the separate lines are not valid
JSON objects in themselves.

{% highlight scala %}
// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
Expand Down Expand Up @@ -653,6 +658,11 @@ This conversion can be done using one of two methods in a JavaSQLContext :
* `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
* `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.

Note that the file that is offered as _jsonFile_ is not necessarily a valid JSON file,
since each line must contain a separate, valid JSON object. In reverse, a valid
multi-line JSON file will most often fail because the separate lines are not valid
JSON objects in themselves.

{% highlight java %}
// sc is an existing JavaSparkContext.
JavaSQLContext sqlContext = new org.apache.spark.sql.api.java.JavaSQLContext(sc);
Expand Down Expand Up @@ -691,6 +701,11 @@ This conversion can be done using one of two methods in a SQLContext:
* `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
* `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.

Note that the file that is offered as _jsonFile_ is not necessarily a valid JSON file,
since each line must contain a separate, valid JSON object. In reverse, a valid
multi-line JSON file will most often fail because the separate lines are not valid
JSON objects in themselves.

{% highlight python %}
# sc is an existing SparkContext.
from pyspark.sql import SQLContext
Expand Down

0 comments on commit fca7dfb

Please sign in to comment.