Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There are multiple input files for each of the following file names: null, null.bai. #2

Closed
dakomura opened this issue Aug 7, 2022 · 3 comments

Comments

@dakomura
Copy link

dakomura commented Aug 7, 2022

Hi there,

I tried to apply purple-nf to my WGS bam files with tumor-only mode.
But when I ran the command below, an error occurred.

#!/bin/bash

BAMDIR=/data/komura/project/NGS/MENI/WGS/results/fq2bam
ref=/data/share/resources/ref/Homo_sapiens_assembly38

export TMPDIR=/tmp
export NXF_DEFAULT_DSL=1

nextflow run iarcbioinfo/purple-nf -r v1.1 \
-profile docker --tn_file t.txt \
--cohort_dir $BAMDIR \
--ref ${ref}.fasta --ref_dict ${ref}.dict \
--tumor_only \
--bam \
--cpu 6 \
--mem 64 \
--output_folder PURPLE_out

The error message was as follows.

-------------------Calling PARAMETERS---------------------
output_folder     : PURPLE_out
ref               : /data/share/resources/ref/Homo_sapiens_assembly38.fasta
ref_dict          : /data/share/resources/ref/Homo_sapiens_assembly38.dict
tn_file           : t.txt
help              : false
debug             : false
cohort_dir        : /data/komura/project/NGS/MENI/WGS/results/fq2bam
tumor_only        : true
bam               : true
somatic_vcfs      : null
max_memory        : 128 GB
max_cpus          : 8
max_time          : 10d
cpu               : 6
mem               : 64
----------------------------------------------------------


-------------------Software versions---------------------
hmftools-cobalt   : 1.11
hmftools-amber    : 2.52
hmftools-purple   : 3.5
----------------------------------------------------------


[-        ] process > HQ_VCF -
[-        ] process > COBALT -
[-        ] process > AMBER  -
[-        ] process > PURPLE -
WARN: Operator `spread` is deprecated -- it will be removed in a future release
[-        ] process > HQ_VCF -
[-        ] process > COBALT -
[-        ] process > AMBER  -
[-        ] process > PURPLE -
WARN: Operator `spread` is deprecated -- it will be removed in a future release
Error executing process > 'AMBER (7)'

Caused by:
  Process `AMBER` input file name collision -- There are multiple input files for each of the following file names: null, null.bai

The input file list was like this.

tumor_id    sample  tumor
W115868-T1_QT_1.fq.gz  W115868-T1_QT_1.fq.gz  W115868-T1_QT.bam
W185949-T3_QT_1.fq.gz  W185949-T3_QT_1.fq.gz  W185949-T3_QT.bam
W124995-T4_QT_1.fq.gz  W124995-T4_QT_1.fq.gz  W124995-T4_QT.bam
W180651-T4_QT_1.fq.gz  W180651-T4_QT_1.fq.gz  W180651-T4_QT.bam
W185949-T1_QT_1.fq.gz  W185949-T1_QT_1.fq.gz  W185949-T1_QT.bam
W115868-T4_QT_1.fq.gz  W115868-T4_QT_1.fq.gz  W115868-T4_QT.bam
W180651-T1_QT_1.fq.gz  W180651-T1_QT_1.fq.gz  W180651-T1_QT.bam
W124995-T1_QT_1.fq.gz  W124995-T1_QT_1.fq.gz  W124995-T1_QT.bam
W153058-T1_QT_1.fq.gz  W153058-T1_QT_1.fq.gz  W153058-T1_QT.bam
W153058-T3_QT_1.fq.gz  W153058-T3_QT_1.fq.gz  W153058-T3_QT.bam

It seems the names of the bam file were not properly processed.
How should I fix the problem?

@nalcala
Copy link
Member

nalcala commented Sep 9, 2022

Hi @dakomura ,

thanks for using our pipeline, and sorry for the delay in answering. Could it be that the input file is not tab-separated (e.g., the sample and tumor columns are separated by two spaces instead of a tabulation)? That is the only way I could reproduce this error.

It looks like there are too many input files named "null" and "null.bai". It is not unexpected that you would have one of each, given that there is no normal column so nextflow should create dead symlinks named "null" and "null.bai" instead, but it shouldn't raise an issue unless there is also one of the other inputs (probably the tumor column) which is also not recognized. Can you try adding a column "normal" to your input file, with any entry except "null" (for example none in each row)? This should force nextflow to name the nonexistent normal files none and none.bai and lift the error. You will then probably have another error because AMBER will miss one input file, but can you then look at the nextflow work directory of the failed job and see what is in there. Normally, you should have symbolic links to the fasta reference, fai index, tumor bam and bai, and then the invalid links none and none.bai, Whatever is not present and is replaced by a symlink named "null" should be your missing input. My guess is that it is the tumor column that is not recognized in your input file but unless there is a formatting issue, in which case the code will not find the tumor column and thus just put null for all rows, the reason escapes me...

Hope that helps, and good luck!

Best,

Nicolas

@dakomura
Copy link
Author

Hi @nalcala ,

Thank you for your kind reply.
The tool worked when I changed two spaces to a tabulation in the input file.

I'm sorry for bothering you about such a careless mistake.
I also appreciate your great pipeline!

Best,

Daisuke

@nalcala
Copy link
Member

nalcala commented Sep 12, 2022

Perfect!

Don't mention it, it could totally have happened to me too (probably has actually...). I will add this in the "common errors" section, others will certainly have the same issue.

Best,

Nicolas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants