Add a Note on jsonFile having separate JSON objects per line

* This commit hopes to avoid the confusion I faced when trying to submit a regular, valid multi-line JSON file, also see http://apache-spark-user-list.1001560.n3.nabble.com/Loading-JSON-Dataset-fails-with-com-fasterxml-jackson-databind-JsonMappingException-td20041.html
apache · Nov 30, 2014 · fca7dfb · fca7dfb
1 parent 0fcd24c
commit fca7dfb
Showing 1 changed file with 15 additions and 0 deletions.
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
@@ -615,6 +615,11 @@ This conversion can be done using one of two methods in a SQLContext:
 * `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
 * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.
 
+Note that the file that is offered as _jsonFile_ is not necessarily a valid JSON file,
+since each line must contain a separate, valid JSON object. In reverse, a valid
+multi-line JSON file will most often fail because the separate lines are not valid
+JSON objects in themselves.
+
 {% highlight scala %}
 // sc is an existing SparkContext.
 val sqlContext = new org.apache.spark.sql.SQLContext(sc)
@@ -653,6 +658,11 @@ This conversion can be done using one of two methods in a JavaSQLContext :
 * `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
 * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.
 
+Note that the file that is offered as _jsonFile_ is not necessarily a valid JSON file,
+since each line must contain a separate, valid JSON object. In reverse, a valid
+multi-line JSON file will most often fail because the separate lines are not valid
+JSON objects in themselves.
+
 {% highlight java %}
 // sc is an existing JavaSparkContext.
 JavaSQLContext sqlContext = new org.apache.spark.sql.api.java.JavaSQLContext(sc);
@@ -691,6 +701,11 @@ This conversion can be done using one of two methods in a SQLContext:
 * `jsonFile` - loads data from a directory of JSON files where each line of the files is a JSON object.
 * `jsonRDD` - loads data from an existing RDD where each element of the RDD is a string containing a JSON object.
 
+Note that the file that is offered as _jsonFile_ is not necessarily a valid JSON file,
+since each line must contain a separate, valid JSON object. In reverse, a valid
+multi-line JSON file will most often fail because the separate lines are not valid
+JSON objects in themselves.
+
 {% highlight python %}
 # sc is an existing SparkContext.
 from pyspark.sql import SQLContext