Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better traversal size caps in vg call #4252

Merged
merged 6 commits into from
Mar 25, 2024
Merged

Better traversal size caps in vg call #4252

merged 6 commits into from
Mar 25, 2024

Conversation

glennhickey
Copy link
Contributor

Changelog Entry

To be copied to the draft changelog by merger:

  • vg call -c and -C options changed to limit search based on all alleles and not just reference allele. This means these options work much better in practice to prevent vg call from being lost in giant snarls.
  • --progress option added to vg call

Description

It's come up a number of times where people try to run vg call on complex graphs and runs forever. The reason being is that it gets lost trying to find traversals through enormous snarls, and there is not enough signal in the read mappings to narrow the search down to something manageable. The min/max traversal cutoffs -c/-C were added to address this, but since they only filtered on reference allele length, they only helped sometimes -- it just takes a giant insertion to get around this. This PR changes these options to take into account alt alleles as well. So if you run -c 50 -c 1000, it will only try to genotype sites where at least one traversal is >=50bp, and it will give up on any site as soon as a single traversal >1000bp is found.

@glennhickey glennhickey merged commit c3cbdbf into master Mar 25, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants