Skip to content

Commit

Permalink
[SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark app without python main…
Browse files Browse the repository at this point in the history
… resource does not honor "spark.jars.packages"

SparkSubmit determines pyspark app by the suffix of primary resource but Livy
uses "spark-internal" as the primary resource when calling spark-submit,
therefore args.isPython is set to false in SparkSubmit.scala.

In Yarn mode, SparkSubmit module is responsible for resolving maven coordinates
and adding them to "spark.submit.pyFiles" so that python's system path can be set correctly.

The fix is to resolve maven coordinates not only when args.isPython is true,
but also when primary resource is spark-internal.

Tested the patch with Livy submitting pyspark app, spark-submit, pyspark with or without packages config.

Signed-off-by: Shanyu Zhao <shzhaomicrosoft.com>

Closes #23009 from shanyu/shanyu-26011.

Authored-by: Shanyu Zhao <shzhao@microsoft.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(cherry picked from commit 9a5fda6)
Signed-off-by: Sean Owen <sean.owen@databricks.com>
  • Loading branch information
shanyu authored and srowen committed Nov 15, 2018
1 parent 0c7d82b commit 7a59618
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,7 @@ object SparkSubmit extends CommandLineUtils with Logging {

if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
if (args.isPython) {
if (args.isPython || isInternal(args.primaryResource)) {
args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
}
}
Expand Down

0 comments on commit 7a59618

Please sign in to comment.