Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre-installed conda configuration and use to find rlib directory #700

Merged
merged 2 commits into from
Jul 13, 2020

Conversation

jeremyjliu
Copy link

@jeremyjliu jeremyjliu commented Jul 7, 2020

Upstream SPARK-XXXXX ticket and PR link (if not applicable, explain)

Conda env change, not filed in upstream.

What changes were proposed in this pull request?

Executor uses CondaEnvironment.condaEnvPath + /lib/R/library or looks in SPARK_HOME for the sparkR package in order to launch a worker/daemon R process. Thus, when conda is already pre-installed on docker image and CondaEnvironment is not set, RRunner is unable to locate a rLibDir to look for the sparkR package files. This is not an issue for python, as python is able to locate its own installed modules when launched non-interactively, unlike the Rscript command that requires an explicit file path.

This PR introduces a SparkConf variable to note the pre-installed conda env path that can be read by RRunner when appropriate.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

@jeremyjliu jeremyjliu changed the title Add pre-installed conda configuration and utilize to local rlib directory Add pre-installed conda configuration and use to find rlib directory Jul 7, 2020
@rshkv
Copy link

rshkv commented Jul 8, 2020

Looks reasonable, just don't fully understand the problem. Can you elaborate this bit?

Thus, when conda is already pre-installed on docker image and CondaEnvironment is not set, RRunner is unable to locate a rLibDir to look for the sparkR package files.

rLibDir is referring to /lib/R/library? Why can't we get Rscript from the pre-installed Conda env? I'd assume pre-installed means Conda has added itself to the path, no?

@jeremyjliu
Copy link
Author

What we currently do for pre-installed conda environments, is pass the Rscript path via the R_COMMAND variable: https://github.com/palantir/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/R.scala#L37

R_COMMAND would be "/opt/conda/bin/Rscript", which is where conda installs the Rscript executable (as well as the R executable) normally. Rscript works in that it expects a path of an .R file to execute, so the full command in RRunner.scala needs to be /opt/conda/bin/Rscript /opt/conda/lib/R/library/SparkR/worker/worker.R. This works differently than in python, where the analogous full command is /opt/conda/bin/python -m pyspark.worker and python is able automatically locate the relevant module/file.

  • rLibDir is referring to /lib/R/library?
    yup, specifically /lib/R/library that exists

  • Why can't we get Rscript from the pre-installed Conda env?
    we can, it's passed in as the R_COMMAND config variable

  • I'd assume pre-installed means Conda has added itself to the path, no?
    not relevant, due to how Rscript works. the python executable installed by conda knows where its packages are installed, because conda properly installs them into the right site-packages. likewise, so does the R executable (used for interactive sessions) but sadly not the Rscript executable (used for executing scripts).

@rshkv rshkv merged commit 597d5a2 into palantir:master Jul 13, 2020
@robert3005
Copy link

this needs an entry in https://github.com/palantir/spark/blob/master/FORK.md. We should change pr template to include mention of that

@jeremyjliu
Copy link
Author

@robert3005
Copy link

All good my bad for missing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants