-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update SortMeRNA to use SilvaDB 138 (for commercial use) #570
Comments
Hi @nh13! Hope you are well! We do have a parameter that allows you to override the default databases you provide to the pipeline i.e. So based on my deductions I am assuming you mean we change the sentences here and here? |
It'd be great if either So why not just align to the full SilvaDB release 38, which allows for both commercial and non-commercial use by default? It is more comprehensive than the set up there? Perhaps some RNA-Seq analysis experts could weigh in? |
I just saw that you edited the issue @drejom 😂 Fate...hope you are well! |
I am! Just a pandemic and an insurrection between drinks! Looking forward to a UK visit….one day! |
@d4straub is the person to ask - not too much experience on SortMeRNA / SILVA either, sorry :-( |
Updating to v4.3.1 would improve runtime, see https://github.com/biocore/sortmerna/releases/tag/v4.3.1 |
So I made a concerted effort to try and use the latest Biocontainer thinking I could just swap out the container and put my feet up because everything else with the process would just work. No no....a couple of hours later after having experienced The module file is here |
It may be a better solution to just use |
Yup. The newer releases were supposed to address this but it appears that we are now just seeing a different set of issues😅 A metagenomics classifier type approach using Kraken2 would be quite cool too which would bypass the mapping and generate filtered fastqs directly - maybe not as sensitive as mapping if done loosely but would do the trick I think. I used to run RNA-SeQC for the longest time to get rRNA estimates as a QC metric and then to deal with the counts appropriately downstream if required, before the differential analysis. This pipeline also generates a feature biotypes plot with this info in the MultiQC report. Personally, I think that is the best way and bypasses the need to do any FastQ filtering at all. It appears the links are broken on the RNA-SeQC website too - not doing very well. Time to shut the lid! Have a good evening! |
This might work more or less for an isolate but not for environmental samples (i.e. a mixture of organisms with previously unknown rRNA sequences), here SortMeRNA has advantages. But this was my intention, to make this pipeline fit for metatranscriptomics when adding SortMeRNA. Your tests @drpatelh suggest that it might be better to just stay with version 4.2.0 (despite being slow, but at least not breaking the pipeline, correct?) and attempt to just change the database to silva 138 to allow commercial use. Would that sound fine to you? |
I think this may be the path of least resistance given that the latest release still seems quite buggy and most people aren't using this option when running the pipeline. It would be great if you have some time to confirm this is the case. Bumping the version in the SortMeRNA module and running Yup, if we can't update the software version maybe it is worth updating the SILVA databases which I assume are independent and won't break anything with the current tool version in the pipeline (or make it even sloooooooower)? |
The latest version of SortMeRNA (v4.3.4) is now working smoothly via a simple update of the existing nf-core/module. It now also supports native compression of output files which is nice. I believe the databases have also been updated as of >4.2.0 as mentioned here so will close this issue! |
SilvaDB release 138 is now available for commercial use! See: https://www.arb-silva.de/silva-license-information/
The text was updated successfully, but these errors were encountered: