You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sniffles v2.0.7 can produce malformed VCF output containing R nucleotides in the REF column. These are not allowed according to the VCF v4.2 specification: REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive). The VCF v4.3 specification additionaly mentions: IUPAC ambiguity codes should be converted to a concrete base. Downstream tools such as HTSJDK throw an error correctly stating that the VCF is malformed.
For our use case this result in analysis that cannot complete.
Running bcftools norm --output out.vcf --fasta-ref GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz --check-ref wx vip_9_long_read_sv.vcf.gz shows many more issues in the REF column:
...
REF_MISMATCH chr10 133765397 N A
REF_MISMATCH chr10 133779600 N A
REF_MISMATCH chr10 133785181 N A
REF_MISMATCH chr10 133786526 N C
Lines total/split/realigned/skipped: 2251/0/3/1055
The commands to generate the .snf files are similar to:
Hello @fritzsedlazeck and Sniffles developers,
Sniffles v2.0.7 can produce malformed VCF output containing
R
nucleotides in theREF
column. These are not allowed according to the VCF v4.2 specification: REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive). The VCF v4.3 specification additionaly mentions: IUPAC ambiguity codes should be converted to a concrete base. Downstream tools such as HTSJDK throw an error correctly stating that the VCF is malformed.For our use case this result in analysis that cannot complete.
Example:
We were able to reproduce the issue with the GIAB HG002 trio (see malformed_vcf_issue.zip) for the snf resources:
Greetings,
@dennishendriksen
The text was updated successfully, but these errors were encountered: