RNAseq pipeline changing my sample names #1077

AnnaDvdH · 2023-09-22T13:48:42Z

Description of the bug

Hello,

I've been having some issues when running the rnaseq pipeline in Uppmax. It seems like the pipeline is automatically switching the names of my samples, which makes it really difficult to trace which sample is which.

I am working with tumour data, so my samples contain a T from tumour, followed by a number, for example 07TDOXTAK_NC_T3. However, in my results the samples are all being switched to something like 07TDOXTAK_NC_T1. Since I have several tumours from the same individual, I really need to keep the number after the T.

I got some help already through the slack channel, and I got recommended to switch my sample name to something else. This is because the pipeline uses _T<DIGIT> to name technical replicates and because I am adding it as my sample name, it is breaking the assumptions in the pipeline. I did this, so now my samples don't end with a T1 but with a NC (something like 07TDOXTAK_T3_NC), however, I am getting the error again and all my samples are being switched to 07TDOXTAK_NC_T1 with the _T1 at the end, and not NC. I am a bit lost and not sure what to test next.

Thank you in advance for all the help!

Command used and terminal output

# Code used to run the pipeline as a .sh file followed by the name of my inputfile.csv

module purge
module load uppmax bioinfo-tools
module load Nextflow/22.10.1
module load nf-core-pipelines/latest

# Don't let Java get carried away and use huge amounts of memory
export NXF_OPTS='-Xms1g -Xmx4g'

# Don't fill up your home directory with cache files
export NXF_HOME=/absolutepath/5_RNAseq/arm5_6_rnaseq/
export NXF_TEMP=$SNIC_TMP
export NXF_SINGULARITY_CACHEDIR=/absolutepath/5_RNAseq/cache_rnaseq

# Run RNAseq pipeline

nextflow run $NF_CORE_PIPELINES/rnaseq/3.12.0/workflow \
    --project snic2022-5-620 -profile uppmax \
    --email anna.vd.heiden@imbim.uu.se \
    --fasta genome/cf4.b6.14.fa \
    --gff GCF_011100685.1_UU_Cfam_GSD_1.0_genomic.NameB614.gff \
    --skip_biotype_qc \
    --input $1 \
    --outdir results

Relevant files

No response

System information

No response

The text was updated successfully, but these errors were encountered:

pinin4fjords · 2023-11-08T11:45:46Z

@AnnaDvdH having done some testing I believe this issue is now resolved in dev. If I change the samples in the test profile to have _T2, _T3 etc as suffixes, they retain those into the results in the output directory (see multiqc report attached).

@drpatelh may be able to explain the detail of the fix, but I believe it's a consequence of #1058.

Closing the issue for now. If you feel I've misunderstood, or determine that the fix does not apply to what you're doing, feel free to reopen.

multiqc_report (4).html.zip

AnnaDvdH added the bug Something isn't working label Sep 22, 2023

drpatelh added this to the 3.12.1 milestone Oct 15, 2023

pinin4fjords closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNAseq pipeline changing my sample names #1077

RNAseq pipeline changing my sample names #1077

AnnaDvdH commented Sep 22, 2023

pinin4fjords commented Nov 8, 2023

RNAseq pipeline changing my sample names #1077

RNAseq pipeline changing my sample names #1077

Comments

AnnaDvdH commented Sep 22, 2023

Description of the bug

Command used and terminal output

Relevant files

System information

pinin4fjords commented Nov 8, 2023