Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools annotate fails when contigs are not defined in header #766

Closed
brentp opened this issue Mar 30, 2018 · 6 comments
Closed

bcftools annotate fails when contigs are not defined in header #766

brentp opened this issue Mar 30, 2018 · 6 comments

Comments

@brentp
Copy link

brentp commented Mar 30, 2018

given t.vcf of:

##fileformat=VCFv4.1
##INFO=<ID=SU,Number=.,Type=Integer,Description="Number of pieces of evidence supporting the variant across all samples">
##INFO=<ID=PE,Number=.,Type=Integer,Description="Number of paired-end reads supporting the variant across all samples">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       74911683        A1      A       T       53.79   PASS    SU=6;PE=6

the command:

 bcftools annotate t.vcf
[W::vcf_parse] Contig '1' is not defined in the header. (Quick workaround: index the file with tabix.)
Encountered error, cannot proceed. Please check the error output above.

if I bgzip and index, then this command works. This is suprising since normally that is a warning.

output of bcftools --version is:

bcftools 1.7-25-ge12f43d
Using htslib 1.7-41-g816a220
Copyright (C) 2016 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
@pd3
Copy link
Member

pd3 commented Mar 31, 2018

When the contig lines are not present in the header, bcftools can use the index to get the information it needs - hence the suggested "Quick workaround".
This is not a bug. What exactly is the question here?

@brentp
Copy link
Author

brentp commented Mar 31, 2018

the first command fails with an error and exits before doing any annotation.

in other bcftools commands, this is just a warning. It's not clear to me why it's an error in bcftools annotate, but a warning elsewhere. I'm so used to seeing the Contig XX not defined in header ... that I thought that could not be the actual reason for the failure.

@pd3
Copy link
Member

pd3 commented Jun 30, 2018

Sorry for a late response. Commands which only read VCFs can get away with the missing contigs, but writers need to know all the sequences beforehand, at the time the header is printed. This is because chromosome names in the BCF format are stored as pointers to the dictionary of names in the header. Only BCF writing has this problem and it is possible to make this work when VCF is output. However, it is not worth the effort as simple indexing solves the problem and missing contig names are discouraged anyway.

@pd3 pd3 closed this as completed Jun 30, 2018
@tommycarstensen
Copy link
Contributor

tommycarstensen commented Mar 4, 2019

How come bcftools norm works, but bcftools annotate doesn't work without reheadering and adding information on the contigs. The "quick workaround" is not so quick in my opinion and bcftools reheader --fai does not work on a file stream. This seems more like a bug than a feature to me. I hope the user feedback from myself and @brentp is useful @pd3.

My workaround by the way is to do:

| $bcftools reheader --header header.txt \

With header.txt having the following content:

cat \
 <($bcftools view -h $vcf | grep ^##) \
 <(cat header_lines.txt) \
 <($bcftools view -h $vcf | grep -v ^##) \
> header.txt

And header_lines.txt having the following content:

##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
##contig=<ID=MT,length=16569>

@Titorat
Copy link

Titorat commented Nov 17, 2020

One issue though, is that it silently fail in my case and does not return a non-zero response in the snakemake pipeline

@pd3
Copy link
Member

pd3 commented Nov 18, 2020

@Titorat Please open a new issue with the exact description of the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants