Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot merge tabix files with different sequence names #108

Merged
merged 4 commits into from
Jun 24, 2019

Conversation

tomwhite
Copy link
Member

Fixes broadinstitute/gatk#5997.

Each TabixIndex only stores the sequence names for the sequences it sees, which is a problem when merging indexes for the part files. So if one part only sees chr1 it will only have that in its list, while if another sees chr2 and chr3 it will have those in its list. The issue occurs since the reference index is used, so for the first part 0 would be chr1, whereas for the second it would be chr2. The solution is to store content for all sequences from the header (even if for that part the conent is null for a given sequence), which is what this change does.

@tomwhite tomwhite added the bug Something isn't working label Jun 11, 2019
@tomwhite
Copy link
Member Author

It would be good to merge this soon as it fixes an a bug that comes up a lot in practice. There are follow-on changes for htsjdk (as usual) that I'll propose upstream.

@tomwhite
Copy link
Member Author

I've incorporated the htsjdk parts of this in samtools/htsjdk#1263.

I'd like to merge this soon unless there are any objections.

@heuermh
Copy link
Contributor

heuermh commented Jun 24, 2019

I suggest either rebase + squash + force push or use Squash and merge button to merge commits.

@tomwhite tomwhite merged commit 948ed61 into disq-bio:master Jun 24, 2019
@tomwhite tomwhite deleted the tbi-merge-seq-name-bug branch June 24, 2019 17:05
@tomwhite
Copy link
Member Author

Thanks for approving @heuermh

@heuermh heuermh added this to the 0.4.0 milestone Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HaplotypeCallerSpark: Cannot merge tabix files with different sequence names
2 participants