Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] unload data on iteration over SBT leaves #1534

Merged
merged 1 commit into from
May 18, 2021
Merged

Conversation

ctb
Copy link
Contributor

@ctb ctb commented May 18, 2021

In #1530, we observed high memory usage for sourmash gather --linear against indexed SBTs. This was caused by retention of loaded leaf nodes during the SBT.signatures() method call used by --linear. This PR calls unload_data after each leaf's signature is yielded by the SBT.leaves() method underlying SBT.signatures().

updated benchmarks for 45k indexed zipfile (SBT) with this PR:

Time Memory
no-linear/prefetch 10s 215mb
no-linear/no-prefetch 22s 214mb
old linear/prefetch 177s 1502mb
new linear/prefetch 245s 81mb

updated benchmarks for 280k indexed zipfile (SBT) with this PR:

Time Memory
no-linear/prefetch 4m 56s 1 GB
no-linear/no-prefetch 1h 16m 1 GB
old linear/prefetch 20m 8.8 GB
new linear/prefetch 18m 46s 1 GB

Ready for review and merge @luizirber @bluegenes

@codecov
Copy link

codecov bot commented May 18, 2021

Codecov Report

Merging #1534 (52710d5) into latest (5234bf1) will increase coverage by 4.98%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1534      +/-   ##
==========================================
+ Coverage   90.20%   95.19%   +4.98%     
==========================================
  Files         126       99      -27     
  Lines       21192    17494    -3698     
  Branches     1594     1595       +1     
==========================================
- Hits        19117    16654    -2463     
+ Misses       1844      608    -1236     
- Partials      231      232       +1     
Flag Coverage Δ
python 95.19% <66.66%> (-0.01%) ⬇️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/sbt.py 80.79% <66.66%> (-0.08%) ⬇️
src/core/src/signature.rs
src/core/src/index/sbt/mod.rs
src/core/src/from.rs
src/core/src/sketch/nodegraph.rs
src/core/src/errors.rs
src/core/src/cmd.rs
src/core/tests/minhash.rs
src/core/src/sketch/hyperloglog/estimators.rs
src/core/src/index/mod.rs
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5234bf1...52710d5. Read the comment docs.

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, this looks great to me 🚀

@ctb ctb merged commit a782b1d into latest May 18, 2021
@ctb ctb deleted the sbt/unload_data_on_iter branch May 18, 2021 17:52
@ctb
Copy link
Contributor Author

ctb commented May 18, 2021

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants