Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNAseq pipeline changing my sample names #1077

Closed
AnnaDvdH opened this issue Sep 22, 2023 · 1 comment
Closed

RNAseq pipeline changing my sample names #1077

AnnaDvdH opened this issue Sep 22, 2023 · 1 comment
Labels
bug Something isn't working
Milestone

Comments

@AnnaDvdH
Copy link

Description of the bug

Hello,

I've been having some issues when running the rnaseq pipeline in Uppmax. It seems like the pipeline is automatically switching the names of my samples, which makes it really difficult to trace which sample is which.

I am working with tumour data, so my samples contain a T from tumour, followed by a number, for example 07TDOXTAK_NC_T3. However, in my results the samples are all being switched to something like 07TDOXTAK_NC_T1. Since I have several tumours from the same individual, I really need to keep the number after the T.

I got some help already through the slack channel, and I got recommended to switch my sample name to something else. This is because the pipeline uses _T<DIGIT> to name technical replicates and because I am adding it as my sample name, it is breaking the assumptions in the pipeline. I did this, so now my samples don't end with a T1 but with a NC (something like 07TDOXTAK_T3_NC), however, I am getting the error again and all my samples are being switched to 07TDOXTAK_NC_T1 with the _T1 at the end, and not NC. I am a bit lost and not sure what to test next.

Thank you in advance for all the help!

Command used and terminal output

# Code used to run the pipeline as a .sh file followed by the name of my inputfile.csv

module purge
module load uppmax bioinfo-tools
module load Nextflow/22.10.1
module load nf-core-pipelines/latest

# Don't let Java get carried away and use huge amounts of memory
export NXF_OPTS='-Xms1g -Xmx4g'

# Don't fill up your home directory with cache files
export NXF_HOME=/absolutepath/5_RNAseq/arm5_6_rnaseq/
export NXF_TEMP=$SNIC_TMP
export NXF_SINGULARITY_CACHEDIR=/absolutepath/5_RNAseq/cache_rnaseq

# Run RNAseq pipeline

nextflow run $NF_CORE_PIPELINES/rnaseq/3.12.0/workflow \
    --project snic2022-5-620 -profile uppmax \
    --email anna.vd.heiden@imbim.uu.se \
    --fasta genome/cf4.b6.14.fa \
    --gff GCF_011100685.1_UU_Cfam_GSD_1.0_genomic.NameB614.gff \
    --skip_biotype_qc \
    --input $1 \
    --outdir results

Relevant files

No response

System information

No response

@AnnaDvdH AnnaDvdH added the bug Something isn't working label Sep 22, 2023
@drpatelh drpatelh added this to the 3.12.1 milestone Oct 15, 2023
@pinin4fjords
Copy link
Member

@AnnaDvdH having done some testing I believe this issue is now resolved in dev. If I change the samples in the test profile to have _T2, _T3 etc as suffixes, they retain those into the results in the output directory (see multiqc report attached).

@drpatelh may be able to explain the detail of the fix, but I believe it's a consequence of #1058.

Closing the issue for now. If you feel I've misunderstood, or determine that the fix does not apply to what you're doing, feel free to reopen.

multiqc_report (4).html.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants