Update SortMeRNA to use SilvaDB 138 (for commercial use) #570

nh13 · 2021-02-16T18:49:25Z

SilvaDB release 138 is now available for commercial use! See: https://www.arb-silva.de/silva-license-information/

drpatelh · 2021-02-17T18:17:40Z

Hi @nh13! Hope you are well! SortMeRNA is one of those tools for which I would like to plead ignorance because I have never used it 😅 How can we accommodate this information into the pipeline? I am aware of issues with run-times as highlighted here but that's off topic.

We do have a parameter that allows you to override the default databases you provide to the pipeline i.e. --ribo_database_manifest but I suspect that's off topic too?

So based on my deductions I am assuming you mean we change the sentences here and here?

nh13 · 2021-02-17T23:00:11Z

@drpatelh

It'd be great if either SortMeRNA could update them (see this issue), but for nf-core I'd expect to be able to use them for commercial use by default. Also, the SortMeRNA databases are very old 29/11/2014, but like you, I "neither have the time nor the inclination" to update them 😆 !

So why not just align to the full SilvaDB release 38, which allows for both commercial and non-commercial use by default? It is more comprehensive than the set up there? Perhaps some RNA-Seq analysis experts could weigh in?

drpatelh · 2021-02-17T23:15:08Z

I am fairly well versed on the dark side of RNA-seq analysis but I fear this issue falls into the even darker realm of classify my DNA/RNA-type voodoo magic. @apeltzer what do we need to sacrifice here?

@drejom !! Been a while!

drpatelh · 2021-02-17T23:16:34Z

I just saw that you edited the issue @drejom 😂 Fate...hope you are well!

drejom · 2021-02-17T23:20:20Z

I am! Just a pandemic and an insurrection between drinks! Looking forward to a UK visit….one day!

drpatelh · 2021-04-11T08:30:02Z

Ping @d4straub @apeltzer. Any ideas how we can incorporate this information into the pipeline? I am planning on getting a release together over the next couple of weeks. Can include this if it's an easy fix. Thanks!

apeltzer · 2021-04-11T09:50:35Z

@d4straub is the person to ask - not too much experience on SortMeRNA / SILVA either, sorry :-(

d4straub · 2021-04-11T12:29:35Z

Updating to v4.3.1 would improve runtime, see https://github.com/biocore/sortmerna/releases/tag/v4.3.1
The SILVA database might be also updated to v138 in v4.3.1, as earlier mentioned for 4.2 that "next release" would come with SILVA v138 . Will investigate this next week.

drpatelh · 2021-04-13T21:05:06Z

So I made a concerted effort to try and use the latest Biocontainer thinking I could just swap out the container and put my feet up because everything else with the process would just work. No no....a couple of hours later after having experienced Segmentation faults and various issues where downstream processes in the pipeline were failing due to corrupt fastq files being generated I gave up to do something else. I also tried to get it to generate uncompressed fastq's that I could zip after the process using the --zip-out parameter. The inline help comments are here but the value evaluation takes completely different types of parameters as defined here. I tried all of those values but no success. I may be missing something stupendously obvious here but it appears that it is going to be too much hassle than it's worth bumping the version on this but be great if someone else can confirm!

The module file is here

nh13 · 2021-04-13T21:13:14Z

It may be a better solution to just use bowtie/bwa/etc to align to the rRNA sequences directly and remove those that have any valid mappings. SortMeRNA is still quite slow.

drpatelh · 2021-04-13T21:51:32Z

Yup. The newer releases were supposed to address this but it appears that we are now just seeing a different set of issues😅

A metagenomics classifier type approach using Kraken2 would be quite cool too which would bypass the mapping and generate filtered fastqs directly - maybe not as sensitive as mapping if done loosely but would do the trick I think.

I used to run RNA-SeQC for the longest time to get rRNA estimates as a QC metric and then to deal with the counts appropriately downstream if required, before the differential analysis. This pipeline also generates a feature biotypes plot with this info in the MultiQC report. Personally, I think that is the best way and bypasses the need to do any FastQ filtering at all. It appears the links are broken on the RNA-SeQC website too - not doing very well. Time to shut the lid!

Have a good evening!

d4straub · 2021-04-15T07:49:57Z

It may be a better solution to just use bowtie/bwa/etc to align to the rRNA sequences directly and remove those that have any valid mappings. SortMeRNA is still quite slow.

This might work more or less for an isolate but not for environmental samples (i.e. a mixture of organisms with previously unknown rRNA sequences), here SortMeRNA has advantages. But this was my intention, to make this pipeline fit for metatranscriptomics when adding SortMeRNA.

Your tests @drpatelh suggest that it might be better to just stay with version 4.2.0 (despite being slow, but at least not breaking the pipeline, correct?) and attempt to just change the database to silva 138 to allow commercial use. Would that sound fine to you?

drpatelh · 2021-04-15T11:36:19Z

Your tests @drpatelh suggest that it might be better to just stay with version 4.2.0 (despite being slow, but at least not breaking the pipeline, correct?)

I think this may be the path of least resistance given that the latest release still seems quite buggy and most people aren't using this option when running the pipeline. It would be great if you have some time to confirm this is the case. Bumping the version in the SortMeRNA module and running nextflow run nf-core/rnaseq ..... -r dev should reproduce the errors. Don't worry if you don't have time.

Yup, if we can't update the software version maybe it is worth updating the SILVA databases which I assume are independent and won't break anything with the current tool version in the pipeline (or make it even sloooooooower)?

drpatelh · 2021-10-05T09:49:28Z

The latest version of SortMeRNA (v4.3.4) is now working smoothly via a simple update of the existing nf-core/module. It now also supports native compression of output files which is nice. I believe the databases have also been updated as of >4.2.0 as mentioned here so will close this issue!

nh13 added the enhancement label Feb 16, 2021

nh13 changed the title ~~Update SortMeRNA to use SilvaDB 138 (for commercial use)000000000~~ Update SortMeRNA to use SilvaDB 138 (for commercial use) Feb 16, 2021

nh13 mentioned this issue Feb 16, 2021

Update SortMeRNA to use SilvaDB 138 sortmerna/sortmerna#282

Closed

drpatelh added this to the 3.1 milestone Apr 11, 2021

drpatelh removed this from the 3.1 milestone Apr 14, 2021

This was referenced Oct 5, 2021

Update SortMeRNA to 4.3.4 nf-core/modules#790

Merged

Update SortMeRNA to 4.3.4 #708

Merged

drpatelh closed this as completed Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update SortMeRNA to use SilvaDB 138 (for commercial use) #570

Update SortMeRNA to use SilvaDB 138 (for commercial use) #570

nh13 commented Feb 16, 2021

drpatelh commented Feb 17, 2021

nh13 commented Feb 17, 2021 •

edited by drejom

Loading

drpatelh commented Feb 17, 2021

drpatelh commented Feb 17, 2021

drejom commented Feb 17, 2021

drpatelh commented Apr 11, 2021

apeltzer commented Apr 11, 2021

d4straub commented Apr 11, 2021

drpatelh commented Apr 13, 2021

nh13 commented Apr 13, 2021

drpatelh commented Apr 13, 2021 •

edited

Loading

d4straub commented Apr 15, 2021 •

edited

Loading

drpatelh commented Apr 15, 2021

drpatelh commented Oct 5, 2021

Update SortMeRNA to use SilvaDB 138 (for commercial use) #570

Update SortMeRNA to use SilvaDB 138 (for commercial use) #570

Comments

nh13 commented Feb 16, 2021

drpatelh commented Feb 17, 2021

nh13 commented Feb 17, 2021 • edited by drejom Loading

drpatelh commented Feb 17, 2021

drpatelh commented Feb 17, 2021

drejom commented Feb 17, 2021

drpatelh commented Apr 11, 2021

apeltzer commented Apr 11, 2021

d4straub commented Apr 11, 2021

drpatelh commented Apr 13, 2021

nh13 commented Apr 13, 2021

drpatelh commented Apr 13, 2021 • edited Loading

d4straub commented Apr 15, 2021 • edited Loading

drpatelh commented Apr 15, 2021

drpatelh commented Oct 5, 2021

nh13 commented Feb 17, 2021 •

edited by drejom

Loading

drpatelh commented Apr 13, 2021 •

edited

Loading

d4straub commented Apr 15, 2021 •

edited

Loading