Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join mismatch/DataflowVariable assignment error #1790

Open
ellisjj opened this issue Feb 3, 2025 · 5 comments
Open

Join mismatch/DataflowVariable assignment error #1790

ellisjj opened this issue Feb 3, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@ellisjj
Copy link

ellisjj commented Feb 3, 2025

Description of the bug

When running with a WGS Human sample on Azure Batch, the GATK base recalibration steps keep failing with an error

Join mismatch for the following entries: key=[patient:HG002, sample:FS11403089, sex:NA, status:0, n_fastq:48, data_type:cram, id:FS11403089] values=

further down in the output there is also the error

ERROR ~ A DataflowVariable can only be assigned once. Use bind() to allow for equal values to be passed into already-bound variables.

In the log file I can see many NoSuchFileExceptions relating to the recalibration steps. These refer to the .command.err and .command.out files which do not exist. The log file .command.log in these work directories has content such as "mkdir: can't create directory 'FS11403089.md.cram': File exists".

Some of the recalibration steps are succeeding, so I'm stuck trying to figure out what's going wrong. Does anyone know what the join mismatch or DataflowVariable errors might mean? This is my first time running the sarek workflow, so I'm unsure if the problem is me not setting some configuration problem or a real bug in the workflow.

Thanks

nextflow.log

Command used and terminal output

$ nextflow run -resume -params-file params.yaml -profile azure -w az://scratch/ellisjj nf-core/sarek -r 3.5.0 --aligner bwa-mem --trim_fastq true --genome GATK.GRCh38 --tools haplotypecaller,manta --save_mapped true --save_output_as_bam true

Relevant files

No response

System information

  • Nextflow version 24.10.4.5934
  • Hardware: cloud
  • Executor: azure batch
  • Container engine: docker
  • Version of nf-core/sarek: 3.5.0
@ellisjj ellisjj added the bug Something isn't working label Feb 3, 2025
@adefelicibus
Copy link

I'm having the same issue in several tools running nf 24.04.4 on aws batch. The same pipeline was working a few days ago.

@asp8200
Copy link
Contributor

asp8200 commented Feb 6, 2025

Hi guys. Thanks for reporting the issue. It is often easier and faster to get help by reporting that kind of problem on the nf-core/sarek-channel. (Perhaps you already did, but I didn't notice.)

Judging from the log-file, I guess you ran baserecalibrator as part of the pipeline. In a perhaps related issue report on Slack, Rikke suggested skipping baserecalibrator, that is, add --skip_tools baserecalibrator. I suggest you try that and report back.

@FriederikeHanssen
Copy link
Contributor

Which issue are you referring to? In this combination it should work (sarek 3.5.0/nextflow 24.10). I am not sure what is going on

@asp8200
Copy link
Contributor

asp8200 commented Feb 6, 2025

Jonas reported ERROR ~ A DataflowVariable can only be assigned once. a long time ago:

https://nfcore.slack.com/archives/CGFUX04HZ/p1687778930306609

@ellisjj
Copy link
Author

ellisjj commented Feb 9, 2025

I can confirm that when adding --skip_tools baserecalibrator the workflow finishes successfully. However, I want to run BQSR so this isn't a solution. Would this indicate it's an error in the workflow and not something to do with my configuration or samples?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants