Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate R library paths to those in container #541

Closed
drpatelh opened this issue Feb 20, 2020 · 9 comments
Closed

Isolate R library paths to those in container #541

drpatelh opened this issue Feb 20, 2020 · 9 comments
Assignees
Labels
template nf-core pipeline/component template

Comments

@drpatelh
Copy link
Member

drpatelh commented Feb 20, 2020

Much like we have already done for Python it would be great if any processes that require R only look within the container for the associated packages. There have been a number of issues reported where users have reported R library/package conflicts when running nf-core pipelines. Most of the time this has been resolved by just renaming/deleting ~/.Rprofile and then re-running. However, this isnt a robust solution.

An alternative as tested by @mashehu would be to add an empty R_PROFILE_USER to nextflow.config in the env scope:

// Export these variables to prevent local Python/R libraries from conflicting with those in the container
env {
  PYTHONNOUSERSITE = 1
  R_PROFILE_USER=""
}

Strictly speaking this variable has to point to a file but that will be quite difficult given the way in which we automate our Docker builds. Unless we add in a separate line to touch an empty file in the container but then this may cause issues with -profile conda. If left empty like this then @mashehu suggested that a warning is generated during the pipeline execution. Please fill in the blanks and add more info here @mashehu.

See R docs

@drpatelh drpatelh added template nf-core pipeline/component template help wanted labels Feb 20, 2020
@mashehu
Copy link
Contributor

mashehu commented Feb 20, 2020

Not much to add. The exact warning is:

WARN: Environment variable `R_PROFILE_USER` evaluates to an empty value

and seems to be printed every time an R script is invoked, so it can occur multiple times.
Another solution would be to add --vanilla to every Rscript command, as described here for snakemake: https://bitbucket.org/snakemake/snakemake/issues/970/ignore-r-profile-when-executing-r-scripts

@ewels
Copy link
Member

ewels commented Feb 20, 2020

https://rdrr.io/r/base/Startup.html

Then, unless --no-init-file was given, R searches for a user profile, a file of R code. The path of this file can be specified by the R_PROFILE_USER environment variable (and tilde expansion will be performed). If this is unset, a file called .Rprofile is searched for in the current directory or in the user's home directory (in that order). The user profile file is sourced into the workspace.

I wonder if we can just make an empty .Rprofile file in the work directory for processes involving Rscripts? Also a little hacky, but might avoid the warnings..

@drpatelh
Copy link
Member Author

Or we just have an extra line in Dockerfile:

RUN touch .Rprofile

and the following in nextflow.config:

// Export these variables to prevent local Python/R libraries from conflicting with those in the container
env {
  PYTHONNOUSERSITE = 1
  R_PROFILE_USER=~/.Rprofile
}

~ should resolve to the top-level directory of the container no?

Will need to be tested to see if that gets rid of the warnings @mashehu reported.

"Most" people will be using containers anyway so we could just take the hit with conda.

@drpatelh
Copy link
Member Author

Having 👀 the docs again. It appears we would have to do the same for .Renviron so the implementation should actually look like:

Dockerfile

RUN touch .Rprofile
RUN touch .Renviron

nextflow.config

// Export these variables to prevent local Python/R libraries from conflicting with those in the container
env {
  PYTHONNOUSERSITE = 1
  R_PROFILE_USER= "~/.Rprofile"
  R_ENVIRON_USER = "~/.Renviron"
}

I have now added this to the atacseq and chipseq pipelines to test.

@apeltzer
Copy link
Member

Did that help @drpatelh ? :-)

@drpatelh
Copy link
Member Author

Yep 👍 This solution works with Conda, Docker and Singularity. For Conda where these files won't be present in the environment the worst case scenario is that the user can touch local versions of these files and provide that via a custom config. May have similar issues with Biocontainers but I think the files not being present is tolerated by R.

@aydemiro
Copy link

aydemiro commented Feb 1, 2023

@drpatelh Thank you for this solution. However, I don't understand how ~ resolve to the top level of the container. Is that a Docker behavior? I think ~ will resolve to the user's local home directory as the local $HOME is automatically bound to $HOME in the container, at least for Singularity. So if the user actually have ~/.Rprofile or ~/.Renviron in their local environment, that will cause a clash. Shouldn't the env scope be like below?

R_PROFILE_USER= "/.Rprofile"
R_ENVIRON_USER = "/.Renviron"

@ssnn-airr
Copy link

After a lot of trial and error, and testing... specifying containerOptions = "-v ~/R" in my config file is what worked for me to avoid my local R libraries being used. I like to add a sessionInfo() at the end of my scripts, and the output kept showing versions of the R packages that where in my local environment but not in the container. I have probably been inadvertently using versions packages that I should have not been using. Sleepless nights ahead.

@ssnn-airr
Copy link

This also seems to work:

env {
  R_LIBS_SITE="NULL"
  R_LIBS_USER="NULL"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
template nf-core pipeline/component template
Projects
None yet
Development

No branches or pull requests

6 participants