Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Avoid calling node.data #516

Merged
merged 1 commit into from
Jul 21, 2018
Merged

[MRG] Avoid calling node.data #516

merged 1 commit into from
Jul 21, 2018

Conversation

luizirber
Copy link
Member

@luizirber luizirber commented Jul 20, 2018

While doing profiling on sourmash gather I found out this line is where we spend almost all the runtime:
matches = sum(1 for value in hashes if node.data.get(value))
This makes sense, since it's the part where we check if the hashes are in the internal node Bloom Filter.

Turns out doing node.data takes some time, so replacing the line with

get = node.data.get
matches = sum(1 for value in hashes if get(value))

gives us ~40% reduced runtime. Yay!

Checklist

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

Ready for review and merge @ctb

@luizirber luizirber requested a review from ctb July 20, 2018 23:14
@codecov-io
Copy link

codecov-io commented Jul 20, 2018

Codecov Report

Merging #516 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #516      +/-   ##
==========================================
+ Coverage   90.75%   90.75%   +<.01%     
==========================================
  Files          33       33              
  Lines        5007     5010       +3     
  Branches       36       36              
==========================================
+ Hits         4544     4547       +3     
  Misses        463      463
Impacted Files Coverage Δ
sourmash/sbtmh.py 86.86% <100%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab67c0b...521ab66. Read the comment docs.

@ctb ctb merged commit 8f3e4cc into master Jul 21, 2018
@ctb ctb deleted the impr/node_data branch July 21, 2018 02:35
luizirber added a commit that referenced this pull request Jul 23, 2018
- avoid calling node.data (#516)
  This makes `sourmash gather` ~40% faster...
- make sourmash compatible with khmer 3 (#508)
  Even tho khmer 3 it's not released, there is an alpha version on bioconda
- manylinux1 wheels and travis build improvements (#507)
- PyPI fixes (#504)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants