apache · panbingkun · Aug 29, 2023 · Aug 30, 2023 · Aug 30, 2023 · Aug 30, 2023
diff --git a/docs/README.md b/docs/README.md
@@ -28,8 +28,8 @@ whichever version of Spark you currently have checked out of revision control.
 
 ## Prerequisites
 
-The Spark documentation build uses a number of tools to build HTML docs and API docs in Scala, Java,
-Python, R and SQL.
+The Spark documentation build uses a number of tools to build HTML docs and API docs in Python, Scala,
+Java, R and SQL.
 
 You need to have [Ruby](https://www.ruby-lang.org/en/documentation/installation/) and
 [Python](https://docs.python.org/2/using/unix.html#getting-and-installing-the-latest-version-of-python)
@@ -129,6 +129,6 @@ The jekyll plugin also generates the PySpark docs using [Sphinx](http://sphinx-d
 using [roxygen2](https://cran.r-project.org/web/packages/roxygen2/index.html) and SQL docs
 using [MkDocs](https://www.mkdocs.org/).
 
-NOTE: To skip the step of building and copying over the Scala, Java, Python, R and SQL API docs, run `SKIP_API=1
+NOTE: To skip the step of building and copying over the Python, Scala, Java, R and SQL API docs, run `SKIP_API=1
 bundle exec jekyll build`. In addition, `SKIP_SCALADOC=1`, `SKIP_PYTHONDOC=1`, `SKIP_RDOC=1` and `SKIP_SQLDOC=1` can be used
 to skip a single step of the corresponding language. `SKIP_SCALADOC` indicates skipping both the Scala and Java docs.
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
@@ -71,9 +71,9 @@
                     <li class="nav-item dropdown">
                         <a href="#" class="nav-link dropdown-toggle" id="navbarAPIDocs" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">API Docs</a>
                         <div class="dropdown-menu" aria-labelledby="navbarAPIDocs">
+                            <a class="dropdown-item" href="api/python/index.html">Python</a>
                             <a class="dropdown-item" href="api/scala/org/apache/spark/index.html">Scala</a>
                             <a class="dropdown-item" href="api/java/index.html">Java</a>
-                            <a class="dropdown-item" href="api/python/index.html">Python</a>
                             <a class="dropdown-item" href="api/R/index.html">R</a>
                             <a class="dropdown-item" href="api/sql/index.html">SQL, Built-in Functions</a>
                         </div>
@@ -128,7 +128,7 @@ <h1 style="max-width: 680px;">Apache Spark - A Unified engine for large-scale da
                 <div class="row mt-5">
                   <div class="col-12 col-lg-6 no-gutters">
                     Apache Spark is a unified analytics engine for large-scale data processing.
-                    It provides high-level APIs in Java, Scala, Python and R,
+                    It provides high-level APIs in Python, Scala, Java and R,
                     and an optimized engine that supports general execution graphs.
                     It also supports a rich set of higher-level tools including
                     <a href="sql-programming-guide.html">Spark SQL</a> for SQL and structured data processing,

diff --git a/docs/index.md b/docs/index.md
@@ -34,7 +34,7 @@ source, visit [Building Spark](building-spark.html).
 
 Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS), and it should run on any platform that runs a supported version of Java. This should include JVMs on x86_64 and ARM64. It's easy to run locally on one machine --- all you need is to have `java` installed on your system `PATH`, or the `JAVA_HOME` environment variable pointing to a Java installation.
 
-Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.8+, and R 3.5+.
+Spark runs on Python 3.8+, Scala 2.12/2.13, Java 8/11/17 and R 3.5+.
 Java 8 prior to version 8u371 support is deprecated as of Spark 3.5.0.
 When using the Scala API, it is necessary for applications to use the same version of Scala that Spark was compiled for.
 For example, when using Scala 2.13, use Spark compiled for 2.13, and compile code/applications for Scala 2.13 as well.
@@ -120,9 +120,9 @@ options for deployment:
 
 **API Docs:**
 
+* [Spark Python API (Sphinx)](api/python/index.html)
 * [Spark Scala API (Scaladoc)](api/scala/org/apache/spark/index.html)
 * [Spark Java API (Javadoc)](api/java/index.html)
-* [Spark Python API (Sphinx)](api/python/index.html)
 * [Spark R API (Roxygen2)](api/R/index.html)
 * [Spark SQL, Built-in Functions (MkDocs)](api/sql/index.html)
 
@@ -163,7 +163,8 @@ options for deployment:
 * AMP Camps: a series of training camps at UC Berkeley that featured talks and
   exercises about Spark, Spark Streaming, Mesos, and more. [Videos](https://www.youtube.com/user/BerkeleyAMPLab/search?query=amp%20camp),
   are available online for free.
-* [Code Examples](https://spark.apache.org/examples.html): more are also available in the `examples` subfolder of Spark ([Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples),
- [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples),
+* [Code Examples](https://spark.apache.org/examples.html): more are also available in the `examples` subfolder of Spark (
  [Python]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/python),
+ [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples),
+ [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples),
  [R]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/r))
diff --git a/docs/ml-pipeline.md b/docs/ml-pipeline.md
@@ -208,7 +208,7 @@ This is useful if there are two algorithms with the `maxIter` parameter in a `Pi
 Often times it is worth it to save a model or a pipeline to disk for later use. In Spark 1.6, a model import/export functionality was added to the Pipeline API.
 As of Spark 2.3, the DataFrame-based API in `spark.ml` and `pyspark.ml` has complete coverage.
 
-ML persistence works across Scala, Java and Python.  However, R currently uses a modified format,
+ML persistence works across Python, Scala and Java. However, R currently uses a modified format,
 so models saved in R can only be loaded back in R; this should be fixed in the future and is
 tracked in [SPARK-15572](https://issues.apache.org/jira/browse/SPARK-15572).
 
@@ -238,9 +238,9 @@ notes, then it should be treated as a bug to be fixed.
 
 This section gives code examples illustrating the functionality discussed above.
 For more info, please refer to the API documentation
-([Scala](api/scala/org/apache/spark/ml/package.html),
-[Java](api/java/org/apache/spark/ml/package-summary.html),
-and [Python](api/python/reference/pyspark.ml.html)).
+([Python](api/python/reference/pyspark.ml.html),
+[Scala](api/scala/org/apache/spark/ml/package.html),
+and [Java](api/java/org/apache/spark/ml/package-summary.html)).
 
 ## Example: Estimator, Transformer, and Param
 

diff --git a/docs/quick-start.md b/docs/quick-start.md
@@ -470,19 +470,19 @@ Congratulations on running your first Spark application!
 * For an in-depth overview of the API, start with the [RDD programming guide](rdd-programming-guide.html) and the [SQL programming guide](sql-programming-guide.html), or see "Programming Guides" menu for other components.
 * For running applications on a cluster, head to the [deployment overview](cluster-overview.html).
 * Finally, Spark includes several samples in the `examples` directory
-([Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples),
+([Python]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/python),
+ [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples),
  [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples),
- [Python]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/python),
  [R]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/r)).
 You can run them as follows:
 
 {% highlight bash %}
-# For Scala and Java, use run-example:
-./bin/run-example SparkPi
-
 # For Python examples, use spark-submit directly:
 ./bin/spark-submit examples/src/main/python/pi.py
 
+# For Scala and Java, use run-example:
+./bin/run-example SparkPi
+
 # For R examples, use spark-submit directly:
 ./bin/spark-submit examples/src/main/r/dataframe.R
 {% endhighlight %}
diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md
@@ -945,9 +945,9 @@ documentation](https://docs.oracle.com/javase/8/docs/api/java/lang/Object.html#h
 
 The following table lists some of the common transformations supported by Spark. Refer to the
 RDD API doc
-([Scala](api/scala/org/apache/spark/rdd/RDD.html),
+([Python](api/python/reference/api/pyspark.RDD.html#pyspark.RDD),
+ [Scala](api/scala/org/apache/spark/rdd/RDD.html),
  [Java](api/java/index.html?org/apache/spark/api/java/JavaRDD.html),
- [Python](api/python/reference/api/pyspark.RDD.html#pyspark.RDD),
  [R](api/R/reference/index.html))
 and pair RDD functions doc
 ([Scala](api/scala/org/apache/spark/rdd/PairRDDFunctions.html),
@@ -1059,9 +1059,9 @@ for details.
 
 The following table lists some of the common actions supported by Spark. Refer to the
 RDD API doc
-([Scala](api/scala/org/apache/spark/rdd/RDD.html),
+([Python](api/python/reference/api/pyspark.RDD.html#pyspark.RDD),
+ [Scala](api/scala/org/apache/spark/rdd/RDD.html),
  [Java](api/java/index.html?org/apache/spark/api/java/JavaRDD.html),
- [Python](api/python/reference/api/pyspark.RDD.html#pyspark.RDD),
  [R](api/R/reference/index.html))
 
 and pair RDD functions doc
@@ -1207,9 +1207,9 @@ In addition, each persisted RDD can be stored using a different *storage level*,
 to persist the dataset on disk, persist it in memory but as serialized Java objects (to save space),
 replicate it across nodes.
 These levels are set by passing a
-`StorageLevel` object ([Scala](api/scala/org/apache/spark/storage/StorageLevel.html),
-[Java](api/java/index.html?org/apache/spark/storage/StorageLevel.html),
-[Python](api/python/reference/api/pyspark.StorageLevel.html#pyspark.StorageLevel))
+`StorageLevel` object ([Python](api/python/reference/api/pyspark.StorageLevel.html#pyspark.StorageLevel),
+[Scala](api/scala/org/apache/spark/storage/StorageLevel.html),
+[Java](api/java/index.html?org/apache/spark/storage/StorageLevel.html))
 to `persist()`. The `cache()` method is a shorthand for using the default storage level,
 which is `StorageLevel.MEMORY_ONLY` (store deserialized objects in memory). The full set of
 storage levels is:
@@ -1596,9 +1596,9 @@ as Spark does not support two contexts running concurrently in the same program.
 
 You can see some [example Spark programs](https://spark.apache.org/examples.html) on the Spark website.
 In addition, Spark includes several samples in the `examples` directory
-([Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples),
+([Python]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/python),
+ [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples),
  [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples),
- [Python]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/python),
  [R]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/r)).
 You can run Java and Scala examples by passing the class name to Spark's `bin/run-example` script; for instance:
 
@@ -1619,4 +1619,4 @@ For help on deploying, the [cluster mode overview](cluster-overview.html) descri
 in distributed operation and supported cluster managers.
 
 Finally, full API documentation is available in
-[Scala](api/scala/org/apache/spark/), [Java](api/java/), [Python](api/python/) and [R](api/R/).
+[Python](api/python/), [Scala](api/scala/org/apache/spark/), [Java](api/java/) and [R](api/R/).
diff --git a/docs/sql-getting-started.md b/docs/sql-getting-started.md
@@ -108,7 +108,7 @@ As an example, the following creates a DataFrame based on the content of a JSON
 
 ## Untyped Dataset Operations (aka DataFrame Operations)
 
-DataFrames provide a domain-specific language for structured data manipulation in [Scala](api/scala/org/apache/spark/sql/Dataset.html), [Java](api/java/index.html?org/apache/spark/sql/Dataset.html), [Python](api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html) and [R](api/R/reference/SparkDataFrame.html).
+DataFrames provide a domain-specific language for structured data manipulation in [Python](api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html), [Scala](api/scala/org/apache/spark/sql/Dataset.html), [Java](api/java/index.html?org/apache/spark/sql/Dataset.html) and [R](api/R/reference/SparkDataFrame.html).
 
 As mentioned above, in Spark 2.0, DataFrames are just Dataset of `Row`s in Scala and Java API. These operations are also referred as "untyped transformations" in contrast to "typed transformations" come with strongly typed Scala/Java Datasets.
 

diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
@@ -55,8 +55,9 @@ A DataFrame is a *Dataset* organized into named columns. It is conceptually
 equivalent to a table in a relational database or a data frame in R/Python, but with richer
 optimizations under the hood. DataFrames can be constructed from a wide array of [sources](sql-data-sources.html) such
 as: structured data files, tables in Hive, external databases, or existing RDDs.
-The DataFrame API is available in Scala,
-Java, [Python](api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html#pyspark.sql.DataFrame), and [R](api/R/index.html).
+The DataFrame API is available in
+[Python](api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.html#pyspark.sql.DataFrame), Scala,
+Java and [R](api/R/index.html).
 In Scala and Java, a DataFrame is represented by a Dataset of `Row`s.
 In [the Scala API][scala-datasets], `DataFrame` is simply a type alias of `Dataset[Row]`.
 While, in [Java API][java-datasets], users need to use `Dataset<Row>` to represent a `DataFrame`.

diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
@@ -68,7 +68,7 @@ operations on other DStreams. Internally, a DStream is represented as a sequence
 [RDDs](api/scala/org/apache/spark/rdd/RDD.html).
 
 This guide shows you how to start writing Spark Streaming programs with DStreams. You can
-write Spark Streaming programs in Scala, Java or Python (introduced in Spark 1.2),
+write Spark Streaming programs in Python (introduced in Spark 1.2), Scala or Java,
 all of which are presented in this guide.
 You will find tabs throughout this guide that let you choose between code snippets of
 different languages.
@@ -762,9 +762,9 @@ DStreams can be created with data streams received through custom receivers. See
 For testing a Spark Streaming application with test data, one can also create a DStream based on a queue of RDDs, using `streamingContext.queueStream(queueOfRDDs)`. Each RDD pushed into the queue will be treated as a batch of data in the DStream, and processed like a stream.
 
 For more details on streams from sockets and files, see the API documentations of the relevant functions in
-[StreamingContext](api/scala/org/apache/spark/streaming/StreamingContext.html) for
-Scala, [JavaStreamingContext](api/java/index.html?org/apache/spark/streaming/api/java/JavaStreamingContext.html)
-for Java, and [StreamingContext](api/python/reference/api/pyspark.streaming.StreamingContext.html#pyspark.streaming.StreamingContext) for Python.
+[StreamingContext](api/python/reference/api/pyspark.streaming.StreamingContext.html#pyspark.streaming.StreamingContext) for Python,
+[StreamingContext](api/scala/org/apache/spark/streaming/StreamingContext.html) for Scala,
+and [JavaStreamingContext](api/java/index.html?org/apache/spark/streaming/api/java/JavaStreamingContext.html) for Java.
 
 ### Advanced Sources
 {:.no_toc}
@@ -1265,12 +1265,12 @@ JavaPairDStream<String, String> joinedStream = windowedStream.transform(rdd -> r
 
 In fact, you can also dynamically change the dataset you want to join against. The function provided to `transform` is evaluated every batch interval and therefore will use the current dataset that `dataset` reference points to.
 
-The complete list of DStream transformations is available in the API documentation. For the Scala API,
-see [DStream](api/scala/org/apache/spark/streaming/dstream/DStream.html)
+The complete list of DStream transformations is available in the API documentation. For the Python API,
+see [DStream](api/python/reference/api/pyspark.streaming.DStream.html#pyspark.streaming.DStream).
+For the Scala API, see [DStream](api/scala/org/apache/spark/streaming/dstream/DStream.html)
 and [PairDStreamFunctions](api/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.html).
 For the Java API, see [JavaDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaDStream.html)
 and [JavaPairDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaPairDStream.html).
-For the Python API, see [DStream](api/python/reference/api/pyspark.streaming.DStream.html#pyspark.streaming.DStream).
 
 ***
 
@@ -2150,7 +2150,7 @@ application left off. Note that this can be done only with input sources that su
 (like Kafka) as data needs to be buffered while the previous application was down and
 the upgraded application is not yet up. And restarting from earlier checkpoint
 information of pre-upgrade code cannot be done. The checkpoint information essentially
-contains serialized Scala/Java/Python objects and trying to deserialize objects with new,
+contains serialized Python/Scala/Java objects and trying to deserialize objects with new,
 modified classes may lead to errors. In this case, either start the upgraded app with a different
 checkpoint directory, or delete the previous checkpoint directory.
 
@@ -2564,6 +2564,8 @@ additional effort may be necessary to achieve exactly-once semantics. There are
     - [Custom Receiver Guide](streaming-custom-receivers.html)
 * Third-party DStream data sources can be found in [Third Party Projects](https://spark.apache.org/third-party-projects.html)
 * API documentation
+  - Python docs
+    * [StreamingContext](api/python/reference/api/pyspark.streaming.StreamingContext.html#pyspark.streaming.StreamingContext) and [DStream](api/python/reference/api/pyspark.streaming.DStream.html#pyspark.streaming.DStream)
   - Scala docs
     * [StreamingContext](api/scala/org/apache/spark/streaming/StreamingContext.html) and
   [DStream](api/scala/org/apache/spark/streaming/dstream/DStream.html)
@@ -2575,10 +2577,8 @@ additional effort may be necessary to achieve exactly-once semantics. There are
     [JavaPairDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaPairDStream.html)
     * [KafkaUtils](api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html),
     [KinesisUtils](api/java/index.html?org/apache/spark/streaming/kinesis/KinesisInputDStream.html)
-  - Python docs
-    * [StreamingContext](api/python/reference/api/pyspark.streaming.StreamingContext.html#pyspark.streaming.StreamingContext) and [DStream](api/python/reference/api/pyspark.streaming.DStream.html#pyspark.streaming.DStream)
 
-* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming)
+* More examples in [Python]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/python/streaming)
+  and [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming)
   and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples/streaming)
-  and [Python]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/python/streaming)
 * [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) and [video](http://youtu.be/g171ndOHgJ0) describing Spark Streaming.