[MRG] speed up `SeqToHashes` `translate` #1946

mr-eyes · 2022-04-12T10:02:19Z

Vectors are very good in insertions, not in deletion. In #1938 I replaced Vec with VecDeque because I was popping out a hash per iteration. This deletion is very slow in Vec because it shifts all the elements in memory, while it's very fast in VecDeque.

In this PR I am reverting back to Vec and replacing all the parts that require deletion or popping by smoother code. Now there's no need for using VecDeque.

Resolves #1945

codecov · 2022-04-12T10:09:43Z

Codecov Report

Merging #1946 (671382b) into latest (01119a2) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           latest    #1946   +/-   ##
=======================================
  Coverage   82.93%   82.94%           
=======================================
  Files         125      125           
  Lines       13755    13759    +4     
  Branches     1877     1877           
=======================================
+ Hits        11408    11412    +4     
  Misses       2075     2075           
  Partials      272      272

Flag	Coverage Δ
python	`90.96% <ø> (ø)`
rust	`65.16% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/core/src/signature.rs	`68.34% <100.00%> (+0.35%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01119a2...671382b. Read the comment docs.

mr-eyes · 2022-04-12T10:18:50Z

Ready for review @ctb @luizirber
I believe it's better now :)

before and after timing in #1945

ctb

This is great! But I have to admit I don't understand why the new code is so much faster 😭 . Could you maybe add some comments?

mr-eyes · 2022-04-12T20:43:51Z

@ctb sure!

Here I am explaining the tradeoff of using Vec vs VecDeque and why I reverted back to Vec.

Vectors are very good in insertions, not in deletion. In #1938 I replaced Vec with VecDeque because I was popping out a hash per iteration. This deletion is very slow in Vec because it shifts all the elements in memory, while it's very fast in VecDeque.

In this PR I am reverting back to Vec and replacing all the parts that require deletion or popping by smoother code. Now there's no need for using VecDeque.

SeqToHashes is mainly an iterator (like a Python generator); In every iteration it hashes and return a kmer. Only in the translate mode, I hash all the kmers for all frames at once, and then consume a hash per iteration. The consuming/yelding part was the problem, I was popping an item in every yield until the vector of hashes is empty (this is very slow in vectors). In this PR I replaced the popping out with an vector indeces iterator and it will keep moving through the vector until it reaches the last element. Hope I well explained it :)

sourmash/src/core/src/signature.rs

Line 226 in 6f5245b

translate_iter_step: 0,

sourmash/src/core/src/signature.rs

Lines 341 to 348 in 6f5245b

    
           if self.translate_iter_step == self.hashes_buffer.len() { 
        
               self.hashes_buffer.clear(); 
        
               self.kmer_index = self.max_index; 
        
               return Some(Ok(0)); 
        
           } 
        
           let curr_idx = self.translate_iter_step; 
        
           self.translate_iter_step += 1; 
        
           Some(Ok(self.hashes_buffer[curr_idx]))

ctb · 2022-04-12T22:57:39Z

ahh, got it! but - is this something that is worth adding in comments in the actual code, do you think?

ctb · 2022-04-12T22:58:02Z

(I tend to think that something that required the amount of discussion this issue had is worth documenting in the code.)

mr-eyes · 2022-04-12T23:08:31Z

I spent sometime to understand what I was doing in the code, so, yes I will add some comments.

mr-eyes · 2022-04-13T20:48:34Z

@ctb I think it's ready now.

ctb · 2022-04-13T21:05:38Z

cool! I mean, it's been approved for a while, but I can do the button push ;)

mr-eyes · 2022-04-13T21:07:47Z

@ctb oh, ok :D next time I will merge when approved :) Thank you!

lightning fast translate ⚡

335149f

mr-eyes added the rust label Apr 12, 2022

mr-eyes mentioned this pull request Apr 12, 2022

SeqToHashes needs more optimization for super long sequences #1945

Closed

rust clippy

6f5245b

ctb approved these changes Apr 12, 2022

View reviewed changes

in-line comments for translate

671382b

ctb approved these changes Apr 13, 2022

View reviewed changes

ctb merged commit 3fcf2db into latest Apr 13, 2022

ctb deleted the mo/speedup_seqToHashes branch April 13, 2022 21:06

ctb mentioned this pull request Apr 21, 2022

Draft release notes for sourmash v4.4.0 #1968

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] speed up `SeqToHashes` `translate` #1946

[MRG] speed up `SeqToHashes` `translate` #1946

mr-eyes commented Apr 12, 2022 •

edited

Loading

codecov bot commented Apr 12, 2022 •

edited

Loading

mr-eyes commented Apr 12, 2022 •

edited

Loading

ctb left a comment

mr-eyes commented Apr 12, 2022 •

edited

Loading

ctb commented Apr 12, 2022

ctb commented Apr 12, 2022

mr-eyes commented Apr 12, 2022

mr-eyes commented Apr 13, 2022

ctb commented Apr 13, 2022

mr-eyes commented Apr 13, 2022

[MRG] speed up SeqToHashes translate #1946

[MRG] speed up SeqToHashes translate #1946

Conversation

mr-eyes commented Apr 12, 2022 • edited Loading

codecov bot commented Apr 12, 2022 • edited Loading

Codecov Report

mr-eyes commented Apr 12, 2022 • edited Loading

ctb left a comment

Choose a reason for hiding this comment

mr-eyes commented Apr 12, 2022 • edited Loading

ctb commented Apr 12, 2022

ctb commented Apr 12, 2022

mr-eyes commented Apr 12, 2022

mr-eyes commented Apr 13, 2022

ctb commented Apr 13, 2022

mr-eyes commented Apr 13, 2022

[MRG] speed up `SeqToHashes` `translate` #1946

[MRG] speed up `SeqToHashes` `translate` #1946

mr-eyes commented Apr 12, 2022 •

edited

Loading

codecov bot commented Apr 12, 2022 •

edited

Loading

mr-eyes commented Apr 12, 2022 •

edited

Loading

mr-eyes commented Apr 12, 2022 •

edited

Loading