-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for non-standard chromosome names containing [:-] characters #1630
Conversation
Note hts_parse_region() cannot be used because it requires the header and without the header the caller does not learn the contig name. Resolves samtools#1620
Apparently I had misremembered the public I'm inclined to think that if we have a requirement to parse regions without a header being present, then perhaps this API should be modified to cope with it. My preference would be to avoid code duplication where possible, but I haven't really looked in detail to know how much work is involved and whether we just replace duplication in one place with it in another (eg a separate code-path for headerless parsing). |
Can you give me an example of how to trigger this? I'm unable to do so, so cannot test the fix. I tried modifying test/test-bcf-sr.pl as follows:
So my files look like this:
Running [Edit: yes, I see it now in "Failed to detect $badness: $cmd". It's specifically checking on duff input to make sure it doesn't pass. That's good. :-) ] Is it simply that test-bcf-sr.c doesn't test the region query part of the API, so colons in names never crops up as an issue? I don't particularly like having the test for this function in bcftools, as it means we can't do CI for this code. |
I just checked the code and can confirm that |
Add -O,--output-fmt option so it can write vcf or bcf as well as its original summary format. Add -o,--output option so it's possible to write to a file without shell redirection. Add --args option so input files can be listed directly on the command line instead of via a fofn, to make basic tests easier. Add -r,--regions and -t,--targets options, which behave the same as the equivalents in `bcftools view`. Add the --no-index option to the usage text. Simplify writing the original format. Everything can be sent directly to the output file without going via a kstring. The output writing parts are also moved into separate functions to keep main() from getting too big. Add a few extra error checks. Call exit(EXIT_FAILURE) on failure, not exit(-1). Make the -h option return success.
Add some tests to exercise the --regions / --targets synced reader options. Currently this only includes tests for the chromosomes with [:-] characters in the name, but it could be expanded easily to do others. Test files have been borrowed from pull request samtools/bcftools#1938. Move the synced reader no-index tests from test-bcf-sr.pl to test.pl. The former isn't a good place for them as it gets called 10 times, but the no-index test only needs to run once. It also allows the code running the test to be simplified a bit. Also fix the exit code on test-bcf-sr.pl failure from -1 to 1. Co-authored-by: Petr Danecek <pd3@sanger.ac.uk>
Merge commit removed... |
Note hts_parse_region() cannot be used because it requires the header and without the header the caller does not learn the contig name.
Resolves #1620