Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samplesheet of sarek #1605

Closed
Poocee opened this issue Jul 29, 2024 · 18 comments
Closed

Samplesheet of sarek #1605

Poocee opened this issue Jul 29, 2024 · 18 comments
Labels
bug Something isn't working

Comments

@Poocee
Copy link

Poocee commented Jul 29, 2024

Description of the bug

I am getting an error in the samplesheet composition when running sarek pipelines. I have read many post about this kind of error when runned with different pipelines, but I didn't find anything about a possible solution for my issue.
Speciffically the error reports this message:
"The sample-sheet only contains normal-samples, but the following tools, which were requested with "--tools", expect at least one tumor-sample : mutect2"
Unfortunately, I don't understand what I missed since my samplesheet have both normal and tumor samples (also relapse tumor for some samples), as specified in the usage page.

Command used and terminal output

source /share/data/apps/anaconda3/bin/activate pipelines

nextflow run nf-core/sarek -profile singularity \
--input /share/project3/home/perciostefano/sts/Sarcomics/WES/samplesheet_prova.csv \
--step mapping \
--outdir /share/project3/home/perciostefano/sts/Sarcomics/WES/results/ \
--wes \
--intervals /share/project3/home/perciostefano/reference/genome/hg38_Twist_ILMN_Exome_2.5_Panel_annotated.bed \
--tools cnvkit,mutect2,strelka,merge \
--trim_fastq \
--aligner bwa-mem \
--save_mapped \
--save_output_as_bam \
--only_paired_variant_calling \
--joint_mutect2 \
--genome hg38 \
--save_reference \
--multiqc_title "sts_Sarcomics_wes_report"

Relevant files

samplesheet_prova.csv
nextflow.log

System information

N E X T F L O W ~ version 24.04.3
HPC
Slurm
Singularity
nf-core/sarek v3.4.2-gb5b766d

@Poocee Poocee added the bug Something isn't working label Jul 29, 2024
@FriederikeHanssen
Copy link
Contributor

Hey! I believe the samplesheet error is a bit of a false flag, and the actual error is reported further down the log:

nextflow.exception.WorkflowScriptErrorException: Base quality score recalibration requires at least one resource file. Please provide at least one of `--dbsnp` or `--known_indels`
You can skip this step in the workflow by adding `--skip_tools baserecalibrator` to the command.

The genome you picked - hg38 - does not have all the reference files configured and in this case dbsnp and known_indels are missing. Can you try to either provide them or skip the baserecalibration step?

@Poocee
Copy link
Author

Poocee commented Jul 29, 2024

I try with skip option but the problem still remain with this message:
Pipeline completed with errors-
"The sample-sheet only contains normal-samples, but the following tools, which were requested with "--tools", expect at least one tumor-sample : mutect2"

And also with this one:
"Missing or unknown field in csv file header. Please check your samplesheet"

@FriederikeHanssen
Copy link
Contributor

Ah I think you are missing the column lane . If you only have one per sample, you can put 1 for each. I also noticed that your delimiter is ;. This may work, but I haven't tested it. If the above doesn't work I would try changing the delimiter in your samplesheet to ,

@Poocee
Copy link
Author

Poocee commented Jul 29, 2024

Thank you. Seems that the problem was the lane column (I changed the ";" with "," using vim before running the pipeline).
Unfortunately, I still have error with --intervals parameter of my WES analysis. I used the Illumina bed file named hg38_Twist_ILMN_Exome_2.5_Panel_annotated.bed.

immagine

This file get an error probably due to the 4th column (I attached a preview of this file, can you please confirm this?) which I have removed naming the new file "prova.bed".
However I got this error:
ERROR ~ Error executing process > 'NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (prova)'

Caused by:
Process NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (prova) terminated with an error exit status (1)

Command executed:

bgzip --threads 1 -c prova.bed > prova.bed.gz
tabix prova.bed.gz

cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED":
tabix: $(echo $(tabix -h 2>&1) | sed 's/^.Version: //; s/ .$//')
END_VERSIONS

Command exit status:
1

Command output:
(empty)

Command error:
INFO: Converting SIF file to temporary sandbox...
[E::hts_idx_push] Unsorted positions on sequence #23: 24364365 followed by 24362683
tbx_index_build failed: prova.bed.gz
INFO: Cleaning up image...

Work dir:
/share/project3/home/perciostefano/sts/Sarcomics/WES/sequenced/work/a1/44c13a5e6af6f42ef5153394a61255

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

@Poocee Poocee closed this as completed Jul 29, 2024
@Poocee Poocee reopened this Jul 29, 2024
@Poocee
Copy link
Author

Poocee commented Jul 29, 2024

Sorry I have wrongly closed my issue. I have reopened it. I hope this is not a problem

@FriederikeHanssen
Copy link
Contributor

[E::hts_idx_push] Unsorted positions on sequence https://github.com/nf-core/sarek/pull/23: 24364365 followed by 24362683

yes you will need to sort the bedfile. You can use bedtools for this. Other things I have done in the past depending on your analysis is padding the bed file on each side (depending on your read length, 50-100bp or so). This could cause adjacent regions to overlap in which case you also need to merge them.

Here are the commands I used in the past to prepare the bed file:

  #!/bin/bash

    # sort coordinates
    sort -V -k1,1 -k2,2 $1 > ${1/.bed/.sorted.bed}

    # GATK recommends padding by 100bp: https://gatk.broadinstitute.org/hc/en-us/articles/360035889551-When-should-I-restrict-my-analysis-to-specific-intervals-
    bedtools slop -i ${1/.bed/.sorted.bed} -b 100 -g <genome>.fasta.fai > ${1/.bed/.sorted.padded.bed}

    # Merge overlapping or neighboring regions
    bedtools merge -i ${1/.bed/.sorted.padded.bed} > ${1/.bed/.sorted.padded.merged.bed}


@Poocee
Copy link
Author

Poocee commented Jul 30, 2024

Thank you very much! According to your suggestion I have changed also the genome reference to the standard GRCh38 and now all the warning about panel of normals and dbsnp and know_indels were solved.
I changed also the --intervals with --target_bed since I want to analyze SNV and CNV of my WES data, and I have modified my file .BED as you suggest me. I will let you know if it is all ok!

@FriederikeHanssen
Copy link
Contributor

--target_bed is not a valid parameter in 3.4.2. You need to supply the bed file through the parameter --intervals

@FriederikeHanssen
Copy link
Contributor

@Poocee has your issue been solved? In that case I would close the issue

@Poocee
Copy link
Author

Poocee commented Aug 8, 2024 via email

@Poocee
Copy link
Author

Poocee commented Aug 12, 2024

Dear Friederike,
unfortunately sarek brokes its pipelines giving an error for which I don't understand what to do. Can you help me please?

Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP_SENTIEON:BWAMEM1_MEM (KQ91-1)'

Caused by:
Process NFCORE_SAREK:SAREK:FASTQ_ALIGN_BWAMEM_MEM2_DRAGMAP_SENTIEON:BWAMEM1_MEM (KQ91-1) terminated with an error e status (1)

[mem_sam_pe] paired reads have different names: "LH00193:5:22CNKWLT3:7:1101:24587:1048", "LH00193:16:222TJ2LT4:7:1101208:1056"

I attached you the log file
file_log.txt

@FriederikeHanssen
Copy link
Contributor

Looks some issue with your fastq file. I found this: https://www.biostars.org/p/254155/ maybe it helps

@Poocee
Copy link
Author

Poocee commented Aug 12, 2024

Unfortunately I have receive these files so I cannot go back up. I check your link but I didn't find any solution to my problem. In addition I don't understanf why the problem looks at only one sample and not to all my fastq files.
Thanks in advance

@FriederikeHanssen
Copy link
Contributor

FriederikeHanssen commented Aug 12, 2024

Unfortunately I have receive these files so I cannot go back up. I check your link but I didn't find any solution to my problem.

In that case I would probably try to inspect the files with tools like seqkit to make sure they are valid fastq files.

In addition I don't understanf why the problem looks at only one sample and not to all my fastq files.

All samples are run in parallel and by distinct jobs. It discovered the issue for one sample (KQ25-1) and reported it.

@Poocee
Copy link
Author

Poocee commented Oct 30, 2024 via email

@stw8127
Copy link

stw8127 commented Nov 12, 2024

I am having a similar issue, where the error is given as being related to the sample_sheet.csv, but this sample sheet has been accepted before. There is also an "Cause: Missing process or function Channel.empty([[]])" error, and then further down
"Caused by: groovy.lang.MissingMethodException: No signature of method: java.lang.Object.Channel.empty() is applicable for argument types: (ArrayList) values: [[]]" I can't find any specific posts about these errors before, so any help would be much [appreciated!]

I've added below my .sh, .json and .nextflow.log files here

I'm using:
SLURM HPC
miniconda environment:
(nf_env) bash-4.2$ nextflow info
Version: 24.10.0 build 5928
Created: 27-10-2024 18:36 UTC (18:36 BST)
System: Linux 3.10.0-1160.125.1.el7.x86_64
Runtime: Groovy 4.0.23 on OpenJDK 64-Bit Server VM 17.0.13-internal+0-adhoc..src
nf-core/sarek version 3.4.4

nextflow.log.txt
nf_sarek_241112.sh.txt
nf-params.json
sample_sheet.csv

Many thanks in advance!

@asp8200
Copy link
Contributor

asp8200 commented Nov 13, 2024

I am having a similar issue, where the error is given as being related to the sample_sheet.csv, but this sample sheet has been accepted before. There is also an "Cause: Missing process or function Channel.empty([[]])" error, and then further down "Caused by: groovy.lang.MissingMethodException: No signature of method: java.lang.Object.Channel.empty() is applicable for argument types: (ArrayList) values: [[]]" I can't find any specific posts about these errors before, so any help would be much [appreciated!]

I've added below my .sh, .json and .nextflow.log files here

I'm using: SLURM HPC miniconda environment: (nf_env) bash-4.2$ nextflow info Version: 24.10.0 build 5928 Created: 27-10-2024 18:36 UTC (18:36 BST) System: Linux 3.10.0-1160.125.1.el7.x86_64 Runtime: Groovy 4.0.23 on OpenJDK 64-Bit Server VM 17.0.13-internal+0-adhoc..src nf-core/sarek version 3.4.4

nextflow.log.txt nf_sarek_241112.sh.txt nf-params.json sample_sheet.csv

Many thanks in advance!

@stw8127 : I think you encountered another issue. Using an older version of NF might help: #1622

@stw8127
Copy link

stw8127 commented Nov 13, 2024

Thank you very much, looks like it's going now on v. 3.2.2. I'll have a read on the other thread!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants