[MRG] unload data on iteration over SBT leaves #1534
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In #1530, we observed high memory usage for
sourmash gather --linear
against indexed SBTs. This was caused by retention of loaded leaf nodes during theSBT.signatures()
method call used by--linear
. This PR callsunload_data
after each leaf's signature is yielded by theSBT.leaves()
method underlyingSBT.signatures()
.updated benchmarks for 45k indexed zipfile (SBT) with this PR:
updated benchmarks for 280k indexed zipfile (SBT) with this PR:
Ready for review and merge @luizirber @bluegenes