Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23776][DOC] Update instructions for running PySpark after building with SBT #21628

Closed
wants to merge 2 commits into from

Conversation

bersprockets
Copy link
Contributor

What changes were proposed in this pull request?

This update tells the reader how to build Spark with SBT such that pyspark-sql tests will succeed.

If you follow the current instructions for building Spark with SBT, pyspark/sql/udf.py fails with:

AnalysisException: u'Can not load class test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath;'

How was this patch tested?

I ran the doc build command (SKIP_API=1 jekyll build) and eyeballed the result.

@SparkQA
Copy link

SparkQA commented Jun 25, 2018

Test build #92278 has finished for PR 21628 at commit 9fcd05d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


If you are building PySpark and wish to run the PySpark tests you will need to build Spark with Hive support.

./build/mvn -DskipTests clean package -Phive
./python/run-tests

If you are building PySpark with SBT and wish to run the PySpark tests, you will need to build Spark with Hive support and also build the test components:

./build/sbt -Phive clean package
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the pyspark tests were recently changed so that -Phive is no longer strictly necessary to run pyspark tests, but I decided not to address that in this update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think we don't necessarily mention it now.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the comments above.


If you are building PySpark and wish to run the PySpark tests you will need to build Spark with Hive support.

./build/mvn -DskipTests clean package -Phive
./python/run-tests

If you are building PySpark with SBT and wish to run the PySpark tests, you will need to build Spark with Hive support and also build the test components:

./build/sbt -Phive clean package
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think we don't necessarily mention it now.

If you are building PySpark with SBT and wish to run the PySpark tests, you will need to build Spark with Hive support and also build the test components:

./build/sbt -Phive clean package
./build/sbt sql/test:compile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, shouldn't we better compile other tests too?

@SparkQA
Copy link

SparkQA commented Jun 25, 2018

Test build #92311 has finished for PR 21628 at commit be281e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 4c059eb Jun 26, 2018
@bersprockets
Copy link
Contributor Author

@HyukjinKwon Thanks for your help!

@bersprockets bersprockets deleted the SPARK-23776_doc branch December 30, 2018 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants