Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

picard has occasional need for R #3859

Closed
jefferys opened this issue Feb 19, 2017 · 15 comments
Closed

picard has occasional need for R #3859

jefferys opened this issue Feb 19, 2017 · 15 comments

Comments

@jefferys
Copy link

I'm not familiar with bioconda recipes and packaging, so I apologize in advance if I'm wrong about where the error I'm seeing occurs.

I'm trying to use a docker container with picard. The dockerfile builds the container using bioconda (biodckr/picard). It mostly works fine, but at least one picard tools command, "CollectInsertSizeMetrics", uses R *(Rscript). Is R a dependency that will be auto-installed when installing picard as described in the recipe? If not, should it be? It is a big dependency that is only used in a couple of places, I think just to plot graphics/make pdfs.

Any help is appreciated.

@daler
Copy link
Member

daler commented Feb 20, 2017

Using just conda, the fix would be to add an additional conda install r to have R alongside picard, but this doesn't work when it's used as a container. There's not really a good mechanism for auto-installing only when needed, so R should probably be added as a dependency. @bioconda/core, thoughts?

@bgruening
Copy link
Member

Optional dependencies are dependencies and I would declare them as such. For example in secure environments you don't even have network connection to install things after initial deployment.

@rvalieris
Copy link
Member

picard is a collection of different tools, and not every tool needs R.
maybe a compromise would be to have a picard+r recipe to generate a container with both packages.

@bgruening
Copy link
Member

@raphenya does the R dependency hurt? I'm not really thinking about the Docker stuff, but also plain conda environments for sensitive data. I guess you would expect that if you install picard you get the entire package and all tools will just work.

@chapmanb
Copy link
Member

I'm -1 on having R as a dependency of Picard. That's a pretty heavy weight requirement for a small subset of the tools. I agree with Renan's suggestion for a meta-package.

@jefferys
Copy link
Author

I suspect this doesn't come up except in the context of docker-type installs because many people who use picard are doing data analysis and hence are likely to need R. They will never notice the missing dependency. That means the dependency, although heavy, is not likely to matter practically.

How optional tools are treated is a philosophical line-drawing problem. In this case, at least one component throws a messy java error if R does not exist, and R being a dependency is not clearly discussed in the picard documentation. My preference would be for the default picard recipe to include R, but maybe have an optional picard-lite package that leaves out R.

As a user of lots of tools, I generally come down on the side of the default package "just working". Initially, I don't know enough to understand if I need optional dependencies. I have enough problems dealing with errors that are my fault when I use the tool wrong. Avoiding the "environment is not correct/the tool will fail, seemingly randomly" is a big reason for using packages (and docker!).

After using a (working!) tool for a while, I have a deeper understanding. I can then start to care about "performance/resource usage". For my use case, I might provision the tool to hundreds of machines to run thousands of times.

@johanneskoester
Copy link
Contributor

An additional meta package seems reasonable. The usual way to name such a package is *-recommended. I.e. here picard-recommended. This would be similar to e.g. the r metapackage and the way debian handles texlive.

@bgruening
Copy link
Member

We have some request on the container side here: BioContainers/containers#200

@alanhoyle
Copy link

Fixed in the broadinstitute/picard:latest image after this pull request: broadinstitute/picard#1198

Adding r-base to the image increased the size of the image by 214MB or ~ 22%.

@rhpvorderman
Copy link
Contributor

This issue has not moved forward for a long time. I did not know it existed. Seeing the reasons why the bioconda picard package was kept in a broken state make my head spin. I disagree with adding a metapackage for the following reasons:

  • R is not an optional dependency. All the metrics stuff crashes. Adding it to recommends is therefore not a good way to go.

  • Bioconda is aimed at researchers, not at people who care about disk usage. When users do conda install -c bioconda picard the package should work out of the box. No exceptions. Period.

  • I find it very annoying that I cannot use biocontainers and use the same version of picard across all the tasks. I have to use a separate container for tasks that use R, this separate container is an additional maintenance burden, and the version of picard is usually different from the one in biocontainers. This is not best research practices. Alternative: I can use the combo container for all tasks, but updating that continuously is not a sustainable thing to do. Conclusion: biocontainers is not usable for Picard. This is IMO not acceptable.

  • For people who have additional performance requirements (small install size) I suggest that a picard-slim package is added. The scope of that package is limited to people who know what they are doing. So then it is not so bad that the metrics stuff does not work.

TLDR; priority "package works as expected" > "install size"

@rhpvorderman
Copy link
Contributor

Sorry for the above rant. I want to acknowledge that r-base is indeed an annoying and overly large package. But including it is the lesser evil in my opinion. I don't mind adding the picard-slim package myself if that is the route we want to go.

@alanhoyle
Copy link

You could use the non-biocontainers images: broadinstitute/picard or broadinstitute/gatk images since last fall, as they now include R and the Metrics outputs work there.

@rhpvorderman
Copy link
Contributor

@alanhoyle Thank you for your suggestion. This is a good workaround.
Unfortunately, that does not make Picard less broken on bioconda.

There is a PR that fixes this: #16398

Adding r-base to the image increased the size of the image by 214MB or ~ 22%.

22% is not too bad right? We are not talking orders of magnitude here. I hope the dependency can be added by default.

@alanhoyle
Copy link

I agree that this should be fixed for both the gatk4 and picard in bioconda.

I think it's telling that the upstream provider (Broad Institute) now includes R in their default container distributions. In my opinion, that means they should be included here as well.

@rhpvorderman
Copy link
Contributor

This issue was fixed in picard with #16398.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants