Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to call without chunking #738

Merged
merged 5 commits into from
Mar 20, 2019
Merged

Option to call without chunking #738

merged 5 commits into from
Mar 20, 2019

Conversation

glennhickey
Copy link
Collaborator

As discussed in #737, chunking along path ranges looks to be more trouble than it's worth on the cactus yeast graph.

  • It is brutally slow with the recall-context size of 2500
  • The nodes are so small, that it's possible that insertions are still missed
  • The resulting chunks (I think do to all the dangling ends) are very slow to compute snarls on (I measured 2 hours for one chunk, vs 6 minutes for the entire input graph)
    So the idea is to get an option to disable chunking altogether and just run vg call on the input graph.

@ghost ghost assigned glennhickey Mar 19, 2019
@ghost ghost added the in progress label Mar 19, 2019
chunk_info = {
'chrom' : chunk_bed_chrom,
'chunk_i' : chunk_i,
'chunk_n' : chunk_counts[chunk_bed_chrom],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmarkello replacing len(bed_lines) with chunk_counts[chunk_bed_chrom] here fixes a bug where the last overlap/2 bases of all but the last chromosome get clipped out when the input gam wasn't split by chromosomes. Mentioning because it may affect your wdl script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant