-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
picard has occasional need for R #3859
Comments
Using just conda, the fix would be to add an additional |
Optional dependencies are dependencies and I would declare them as such. For example in secure environments you don't even have network connection to install things after initial deployment. |
picard is a collection of different tools, and not every tool needs R. |
@raphenya does the R dependency hurt? I'm not really thinking about the Docker stuff, but also plain conda environments for sensitive data. I guess you would expect that if you install picard you get the entire package and all tools will just work. |
I'm -1 on having R as a dependency of Picard. That's a pretty heavy weight requirement for a small subset of the tools. I agree with Renan's suggestion for a meta-package. |
I suspect this doesn't come up except in the context of docker-type installs because many people who use picard are doing data analysis and hence are likely to need R. They will never notice the missing dependency. That means the dependency, although heavy, is not likely to matter practically. How optional tools are treated is a philosophical line-drawing problem. In this case, at least one component throws a messy java error if R does not exist, and R being a dependency is not clearly discussed in the picard documentation. My preference would be for the default picard recipe to include R, but maybe have an optional picard-lite package that leaves out R. As a user of lots of tools, I generally come down on the side of the default package "just working". Initially, I don't know enough to understand if I need optional dependencies. I have enough problems dealing with errors that are my fault when I use the tool wrong. Avoiding the "environment is not correct/the tool will fail, seemingly randomly" is a big reason for using packages (and docker!). After using a (working!) tool for a while, I have a deeper understanding. I can then start to care about "performance/resource usage". For my use case, I might provision the tool to hundreds of machines to run thousands of times. |
An additional meta package seems reasonable. The usual way to name such a package is *-recommended. I.e. here picard-recommended. This would be similar to e.g. the r metapackage and the way debian handles texlive. |
We have some request on the container side here: BioContainers/containers#200 |
Fixed in the broadinstitute/picard:latest image after this pull request: broadinstitute/picard#1198 Adding r-base to the image increased the size of the image by 214MB or ~ 22%. |
This issue has not moved forward for a long time. I did not know it existed. Seeing the reasons why the bioconda picard package was kept in a broken state make my head spin. I disagree with adding a metapackage for the following reasons:
TLDR; priority "package works as expected" > "install size" |
Sorry for the above rant. I want to acknowledge that r-base is indeed an annoying and overly large package. But including it is the lesser evil in my opinion. I don't mind adding the picard-slim package myself if that is the route we want to go. |
You could use the non-biocontainers images: broadinstitute/picard or broadinstitute/gatk images since last fall, as they now include R and the Metrics outputs work there. |
@alanhoyle Thank you for your suggestion. This is a good workaround. There is a PR that fixes this: #16398
22% is not too bad right? We are not talking orders of magnitude here. I hope the dependency can be added by default. |
I agree that this should be fixed for both the gatk4 and picard in bioconda. I think it's telling that the upstream provider (Broad Institute) now includes R in their default container distributions. In my opinion, that means they should be included here as well. |
This issue was fixed in picard with #16398. |
I'm not familiar with bioconda recipes and packaging, so I apologize in advance if I'm wrong about where the error I'm seeing occurs.
I'm trying to use a docker container with picard. The dockerfile builds the container using bioconda (biodckr/picard). It mostly works fine, but at least one picard tools command, "CollectInsertSizeMetrics", uses R *(Rscript). Is R a dependency that will be auto-installed when installing picard as described in the recipe? If not, should it be? It is a big dependency that is only used in a couple of places, I think just to plot graphics/make pdfs.
Any help is appreciated.
The text was updated successfully, but these errors were encountered: