Samplesheet of sarek #1605

Poocee · 2024-07-29T13:13:37Z

Description of the bug

I am getting an error in the samplesheet composition when running sarek pipelines. I have read many post about this kind of error when runned with different pipelines, but I didn't find anything about a possible solution for my issue.
Speciffically the error reports this message:
"The sample-sheet only contains normal-samples, but the following tools, which were requested with "--tools", expect at least one tumor-sample : mutect2"
Unfortunately, I don't understand what I missed since my samplesheet have both normal and tumor samples (also relapse tumor for some samples), as specified in the usage page.

Command used and terminal output

source /share/data/apps/anaconda3/bin/activate pipelines

nextflow run nf-core/sarek -profile singularity \
--input /share/project3/home/perciostefano/sts/Sarcomics/WES/samplesheet_prova.csv \
--step mapping \
--outdir /share/project3/home/perciostefano/sts/Sarcomics/WES/results/ \
--wes \
--intervals /share/project3/home/perciostefano/reference/genome/hg38_Twist_ILMN_Exome_2.5_Panel_annotated.bed \
--tools cnvkit,mutect2,strelka,merge \
--trim_fastq \
--aligner bwa-mem \
--save_mapped \
--save_output_as_bam \
--only_paired_variant_calling \
--joint_mutect2 \
--genome hg38 \
--save_reference \
--multiqc_title "sts_Sarcomics_wes_report"

Relevant files

samplesheet_prova.csv
nextflow.log

System information

N E X T F L O W ~ version 24.04.3
HPC
Slurm
Singularity
nf-core/sarek v3.4.2-gb5b766d

FriederikeHanssen · 2024-07-29T13:27:30Z

Hey! I believe the samplesheet error is a bit of a false flag, and the actual error is reported further down the log:

nextflow.exception.WorkflowScriptErrorException: Base quality score recalibration requires at least one resource file. Please provide at least one of `--dbsnp` or `--known_indels`
You can skip this step in the workflow by adding `--skip_tools baserecalibrator` to the command.

The genome you picked - hg38 - does not have all the reference files configured and in this case dbsnp and known_indels are missing. Can you try to either provide them or skip the baserecalibration step?

Poocee · 2024-07-29T14:13:09Z

I try with skip option but the problem still remain with this message:
Pipeline completed with errors-
"The sample-sheet only contains normal-samples, but the following tools, which were requested with "--tools", expect at least one tumor-sample : mutect2"

And also with this one:
"Missing or unknown field in csv file header. Please check your samplesheet"

FriederikeHanssen · 2024-07-29T14:20:20Z

Ah I think you are missing the column lane . If you only have one per sample, you can put 1 for each. I also noticed that your delimiter is ;. This may work, but I haven't tested it. If the above doesn't work I would try changing the delimiter in your samplesheet to ,

Poocee · 2024-07-29T16:22:10Z

Thank you. Seems that the problem was the lane column (I changed the ";" with "," using vim before running the pipeline).
Unfortunately, I still have error with --intervals parameter of my WES analysis. I used the Illumina bed file named hg38_Twist_ILMN_Exome_2.5_Panel_annotated.bed.

This file get an error probably due to the 4th column (I attached a preview of this file, can you please confirm this?) which I have removed naming the new file "prova.bed".
However I got this error:
ERROR ~ Error executing process > 'NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (prova)'

Caused by:
Process NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (prova) terminated with an error exit status (1)

Command executed:

bgzip --threads 1 -c prova.bed > prova.bed.gz
tabix prova.bed.gz

cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED":
tabix: $(echo $(tabix -h 2>&1) | sed 's/^.Version: //; s/ .$//')
END_VERSIONS

Command exit status:
1

Command output:
(empty)

Command error:
INFO: Converting SIF file to temporary sandbox...
[E::hts_idx_push] Unsorted positions on sequence #23: 24364365 followed by 24362683
tbx_index_build failed: prova.bed.gz
INFO: Cleaning up image...

Work dir:
/share/project3/home/perciostefano/sts/Sarcomics/WES/sequenced/work/a1/44c13a5e6af6f42ef5153394a61255

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

Poocee · 2024-07-29T16:23:46Z

Sorry I have wrongly closed my issue. I have reopened it. I hope this is not a problem

FriederikeHanssen · 2024-07-29T17:21:58Z

[E::hts_idx_push] Unsorted positions on sequence https://github.com/nf-core/sarek/pull/23: 24364365 followed by 24362683

yes you will need to sort the bedfile. You can use bedtools for this. Other things I have done in the past depending on your analysis is padding the bed file on each side (depending on your read length, 50-100bp or so). This could cause adjacent regions to overlap in which case you also need to merge them.

Here are the commands I used in the past to prepare the bed file:

  #!/bin/bash

    # sort coordinates
    sort -V -k1,1 -k2,2 $1 > ${1/.bed/.sorted.bed}

    # GATK recommends padding by 100bp: https://gatk.broadinstitute.org/hc/en-us/articles/360035889551-When-should-I-restrict-my-analysis-to-specific-intervals-
    bedtools slop -i ${1/.bed/.sorted.bed} -b 100 -g <genome>.fasta.fai > ${1/.bed/.sorted.padded.bed}

    # Merge overlapping or neighboring regions
    bedtools merge -i ${1/.bed/.sorted.padded.bed} > ${1/.bed/.sorted.padded.merged.bed}

Poocee · 2024-07-30T13:01:56Z

Thank you very much! According to your suggestion I have changed also the genome reference to the standard GRCh38 and now all the warning about panel of normals and dbsnp and know_indels were solved.
I changed also the --intervals with --target_bed since I want to analyze SNV and CNV of my WES data, and I have modified my file .BED as you suggest me. I will let you know if it is all ok!

FriederikeHanssen · 2024-07-30T13:56:56Z

--target_bed is not a valid parameter in 3.4.2. You need to supply the bed file through the parameter --intervals

FriederikeHanssen · 2024-08-08T10:20:08Z

@Poocee has your issue been solved? In that case I would close the issue

Poocee · 2024-08-08T12:26:21Z

Dear Friederike, Yes, my code is running without any problems (at least at the moment ☺). I want to thank you again for your promptness in replying and your availability. Best regards, Stefano Da: Friederike Hanssen ***@***.***> Inviato: giovedì 8 agosto 2024 12:20 A: nf-core/sarek ***@***.***> Cc: Percio Stefano ***@***.***>; Mention ***@***.***> Oggetto: Re: [nf-core/sarek] Samplesheet of sarek (Issue #1605) @Poocee<https://github.com/Poocee> has your issue been solved? In that case I would close the issue — Reply to this email directly, view it on GitHub<#1605 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AW32NTLRZSSKGGH6KVYHGMLZQNA63AVCNFSM6AAAAABLUILK3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZVGQ3DQOJSHA>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

Poocee · 2024-08-12T13:13:28Z

Dear Friederike,
unfortunately sarek brokes its pipelines giving an error for which I don't understand what to do. Can you help me please?

Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP_SENTIEON:BWAMEM1_MEM (KQ91-1)'

Caused by:
Process NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP_SENTIEON:BWAMEM1_MEM (KQ91-1) terminated with an error e status (1)

[mem_sam_pe] paired reads have different names: "LH00193:5:22CNKWLT3:7:1101:24587:1048", "LH00193:16:222TJ2LT4:7:1101208:1056"

I attached you the log file
file_log.txt

FriederikeHanssen · 2024-08-12T13:15:44Z

Looks some issue with your fastq file. I found this: https://www.biostars.org/p/254155/ maybe it helps

Poocee · 2024-08-12T14:01:15Z

Unfortunately I have receive these files so I cannot go back up. I check your link but I didn't find any solution to my problem. In addition I don't understanf why the problem looks at only one sample and not to all my fastq files.
Thanks in advance

FriederikeHanssen · 2024-08-12T14:42:00Z

Unfortunately I have receive these files so I cannot go back up. I check your link but I didn't find any solution to my problem.

In that case I would probably try to inspect the files with tools like seqkit to make sure they are valid fastq files.

In addition I don't understanf why the problem looks at only one sample and not to all my fastq files.

All samples are run in parallel and by distinct jobs. It discovered the issue for one sample (KQ25-1) and reported it.

Poocee · 2024-10-30T11:12:23Z

Dear Friederike, I follow your suggestion and I have used seqkit to modify my data and resolve the problem. Unfortunately, now I have another problem about only one sample that I not understand. I attach the report about the error description. In addition, I don’t understand why the pipeline doesn’t procede even if only one sample gives a mistake (only fastp and fastqc reports are present). Could you please help me. Thankd in advance. Best, Stefano Da: Friederike Hanssen ***@***.***> Inviato: lunedì 12 agosto 2024 16:42 A: nf-core/sarek ***@***.***> Cc: Percio Stefano ***@***.***>; Mention ***@***.***> Oggetto: Re: [nf-core/sarek] Samplesheet of sarek (Issue #1605) Unfortunately I have receive these files so I cannot go back up. I check your link but I didn't find any solution to my problem. In that case I would probably try to inspect the files with tools like seqkit to make sure they are valid fastq files. In addition I don't understanf why the problem looks at only one sample and not to all my fastq files. It discovered the issue for one sample (KQ25-1) and reported it. — Reply to this email directly, view it on GitHub<#1605 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AW32NTNZNGPJTQCB4E6I4D3ZRDCU5AVCNFSM6AAAAABLUILK3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBUGE3TIMJSGU>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

stw8127 · 2024-11-12T17:20:24Z

I am having a similar issue, where the error is given as being related to the sample_sheet.csv, but this sample sheet has been accepted before. There is also an "Cause: Missing process or function Channel.empty([[]])" error, and then further down
"Caused by: groovy.lang.MissingMethodException: No signature of method: java.lang.Object.Channel.empty() is applicable for argument types: (ArrayList) values: [[]]" I can't find any specific posts about these errors before, so any help would be much [appreciated!]

I've added below my .sh, .json and .nextflow.log files here

I'm using:
SLURM HPC
miniconda environment:
(nf_env) bash-4.2$ nextflow info
Version: 24.10.0 build 5928
Created: 27-10-2024 18:36 UTC (18:36 BST)
System: Linux 3.10.0-1160.125.1.el7.x86_64
Runtime: Groovy 4.0.23 on OpenJDK 64-Bit Server VM 17.0.13-internal+0-adhoc..src
nf-core/sarek version 3.4.4

nextflow.log.txt
nf_sarek_241112.sh.txt
nf-params.json
sample_sheet.csv

Many thanks in advance!

asp8200 · 2024-11-13T09:07:49Z

I am having a similar issue, where the error is given as being related to the sample_sheet.csv, but this sample sheet has been accepted before. There is also an "Cause: Missing process or function Channel.empty([[]])" error, and then further down "Caused by: groovy.lang.MissingMethodException: No signature of method: java.lang.Object.Channel.empty() is applicable for argument types: (ArrayList) values: [[]]" I can't find any specific posts about these errors before, so any help would be much [appreciated!]

I've added below my .sh, .json and .nextflow.log files here

I'm using: SLURM HPC miniconda environment: (nf_env) bash-4.2$ nextflow info Version: 24.10.0 build 5928 Created: 27-10-2024 18:36 UTC (18:36 BST) System: Linux 3.10.0-1160.125.1.el7.x86_64 Runtime: Groovy 4.0.23 on OpenJDK 64-Bit Server VM 17.0.13-internal+0-adhoc..src nf-core/sarek version 3.4.4

nextflow.log.txt nf_sarek_241112.sh.txt nf-params.json sample_sheet.csv

Many thanks in advance!

@stw8127 : I think you encountered another issue. Using an older version of NF might help: #1622

stw8127 · 2024-11-13T10:09:07Z

Thank you very much, looks like it's going now on v. 3.2.2. I'll have a read on the other thread!

Poocee added the bug Something isn't working label Jul 29, 2024

Poocee closed this as completed Jul 29, 2024

Poocee reopened this Jul 29, 2024

FriederikeHanssen closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samplesheet of sarek #1605

Samplesheet of sarek #1605

Poocee commented Jul 29, 2024

FriederikeHanssen commented Jul 29, 2024

Poocee commented Jul 29, 2024

FriederikeHanssen commented Jul 29, 2024

Poocee commented Jul 29, 2024

Poocee commented Jul 29, 2024

FriederikeHanssen commented Jul 29, 2024

Poocee commented Jul 30, 2024

FriederikeHanssen commented Jul 30, 2024

FriederikeHanssen commented Aug 8, 2024

Poocee commented Aug 8, 2024 via email

Poocee commented Aug 12, 2024

FriederikeHanssen commented Aug 12, 2024

Poocee commented Aug 12, 2024

FriederikeHanssen commented Aug 12, 2024 •

edited

Loading

Poocee commented Oct 30, 2024 via email

stw8127 commented Nov 12, 2024

asp8200 commented Nov 13, 2024

stw8127 commented Nov 13, 2024 •

edited

Loading

Samplesheet of sarek #1605

Samplesheet of sarek #1605

Comments

Poocee commented Jul 29, 2024

Description of the bug

Command used and terminal output

Relevant files

System information

FriederikeHanssen commented Jul 29, 2024

Poocee commented Jul 29, 2024

FriederikeHanssen commented Jul 29, 2024

Poocee commented Jul 29, 2024

Poocee commented Jul 29, 2024

FriederikeHanssen commented Jul 29, 2024

Poocee commented Jul 30, 2024

FriederikeHanssen commented Jul 30, 2024

FriederikeHanssen commented Aug 8, 2024

Poocee commented Aug 8, 2024 via email

Poocee commented Aug 12, 2024

FriederikeHanssen commented Aug 12, 2024

Poocee commented Aug 12, 2024

FriederikeHanssen commented Aug 12, 2024 • edited Loading

Poocee commented Oct 30, 2024 via email

stw8127 commented Nov 12, 2024

asp8200 commented Nov 13, 2024

stw8127 commented Nov 13, 2024 • edited Loading

FriederikeHanssen commented Aug 12, 2024 •

edited

Loading

stw8127 commented Nov 13, 2024 •

edited

Loading