-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VarDict-java produces malformed VCF #42
Comments
Hey Arya, I'm Michaela who wrote you the mail. I finally registered here at GitHub.
`*********************************************************************** A USER ERROR has occurred: Cannot read file:///scratch/mhoehne/Gisela/CUTTag/fastq_trimmed/bam_all/bams/varCA/out_WT/callers/WT/vardict/vardict.vcf.gz because no sui$ Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace. A USER ERROR has occurred: Cannot read file:///scratch/mhoehne/Gisela/CUTTag/fastq_trimmed/bam_all/bams/varCA/out_WT/callers/WT/vardict/vardict.vcf.gz because no sui$ even though there is a vardict.vcf.file in that folder. Thank you very much for your help! |
ooops, I meant to delete the
Sorry about that! I've edited the original post to reflect this corrected code. |
Thank you very much! scripts/2vcf.py:267: UserWarning: Ignored 32116410 classification sites that didn't have a variant. But I guess that makes sense!? |
Yes, that is a standard warning message that will happen regardless of the Vardict issue. When generating a VCF, VarCA will keep track of every position in the genome, regardless of whether there's a variant there. The 2vcf.py script will then discard these sites when it converts the final output to VCF. |
Now it worked for all replicates as well as for the merged fastq files! |
I'm creating this issue to record a problem encountered by a user (through personal correspondence). They received the following error message:
Based on the error message, it sounds like the VarDict-java tool is creating a malformed VCF allele:
The
<dup-8>
part of that allele is not valid in the VCF format, so GATK flags it and raises an exception.It appears that someone else has already reported the issue in the VarDict repo. In the meantime, if anyone else encounters this while we wait for the issue to be resolved, I would recommend just discarding those alleles manually using
awk
just like we did in #25 . For example, you could edit line 17 of thecallers/vardict
file from thisto this
This will simply remove any lines in the VCF where the fifth column (for the ALT alleles) contains
<dup
. Ideally, we would keep those lines in the file and fix those alleles so that they are valid, since they potentially represent real structural variants that should be reported in VarCA's output. But without further information, I can't know what the correct allele should be, so I don't know how to properly change it usingawk
.The text was updated successfully, but these errors were encountered: