You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered an issue while running the pipeline in star_salmon mode with multiple batches and different multiqc titles.
I ran the pipeline in star_salmon mode in 2 batches (I split my samplesheet in half because of memory restraints) and specified the same output directory but two different multiqc titles. There are many outputs of the pipeline that have a sample-wise title but some of them don't, including the merged count files from salmon and the deseq2 pca files. Consequently, these files get overwritten by the second batch.
To mitigate this issue, I propose a small enhancement. It would be helpful if the multiqc title could be added to these summary files that would otherwise be overwritten. For example, instead of having files named salmon.merged.gene_tpm.tsv and deseq2.pca.vals.txt, they could be named salmon.merged.MULTIQC_TITLE.gene_tpm.tsv or deseq2.MULTIQC_TITLE.pca.vals.txt, respectively. This would ensure that the files remain distinct.
An alternative solution, for now, is to specify another output directory for the second batch.
Adding proper prefix usage to the relevant local modules will allow users to configure them as they wish- including for the batched analysis described here.
I have encountered an issue while running the pipeline in star_salmon mode with multiple batches and different multiqc titles.
I ran the pipeline in star_salmon mode in 2 batches (I split my samplesheet in half because of memory restraints) and specified the same output directory but two different multiqc titles. There are many outputs of the pipeline that have a sample-wise title but some of them don't, including the merged count files from salmon and the deseq2 pca files. Consequently, these files get overwritten by the second batch.
To mitigate this issue, I propose a small enhancement. It would be helpful if the multiqc title could be added to these summary files that would otherwise be overwritten. For example, instead of having files named salmon.merged.gene_tpm.tsv and deseq2.pca.vals.txt, they could be named salmon.merged.MULTIQC_TITLE.gene_tpm.tsv or deseq2.MULTIQC_TITLE.pca.vals.txt, respectively. This would ensure that the files remain distinct.
An alternative solution, for now, is to specify another output directory for the second batch.
For salmon, the name is hard-coded here:
rnaseq/modules/local/salmon_tximport.nf
Line 30 in 3bec233
For deseq2, an output prefix can be passed to the R script with "-p" or "--outprefix"
https://github.com/nf-core/rnaseq/blob/3bec2331cac2b5ff88a1dc71a21fab6529b57a0f/modules/local/deseq2_qc.nf#LL40C11-L40C11
Thank you for considering this enhancement suggestion!
The text was updated successfully, but these errors were encountered: