Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

easy ways to speed up large parallel runs #46

Open
ekg opened this issue Apr 24, 2020 · 0 comments
Open

easy ways to speed up large parallel runs #46

ekg opened this issue Apr 24, 2020 · 0 comments

Comments

@ekg
Copy link
Owner

ekg commented Apr 24, 2020

There is still some low-hanging fruit.

This is from a merge of 25 assemblies of the same genome.

It seems that the overlap collection is now working well. But:

  • Unnecessary time is spent in a single-threaded overlap merge. This could be done in parallel during the overlap collection).
  • Graph emission is single-threaded, but could be kicked off in a separate thread. We'd need to manage this to make sure it doesn't get too far behind. Or, it might be run in semi-parallel.
[seqwish::seqidx] 0.000 indexing sequences                                       
[seqwish::seqidx] 1117.704 index built                                           
[seqwish::alignments] 1117.704 processing alignments                             
[seqwish::alignments] 1632.284 indexing                                          
[seqwish::alignments] 7420.665 index built                                       
[seqwish::transclosure] 7420.690 computing transitive closures                   
[seqwish::transclosure] 7430.425 0.00% 0-100000000 overlap_collect               
[seqwish::transclosure] 7487.728 0.00% 0-100000000 overlaps_vector_merge         
[seqwish::transclosure] 7582.184 0.00% 0-100000000 rank_build                    
[seqwish::transclosure] 7765.759 0.00% 0-100000000 parallel_union_find           
[seqwish::transclosure] 7862.661 0.00% 0-100000000 dset_write                    
[seqwish::transclosure] 7890.029 0.00% 0-100000000 dset_compression              
[seqwish::transclosure] 7908.197 0.00% 0-100000000 dset_sort                     
[seqwish::transclosure] 7917.390 0.00% 0-100000000 dset_invert                   
[seqwish::transclosure] 7933.798 0.00% 0-100000000 graph_emission                
[seqwish::transclosure] 8967.516 3.44% 100000000-200517826 overlap_collect       
[seqwish::transclosure] 9040.887 3.44% 100000000-200517826 overlaps_vector_merge 
[seqwish::transclosure] 9140.550 3.44% 100000000-200517826 rank_build  
...          
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant