Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: Add threaded multigather to pyo3_branchwater code; add fastmultigather CLI plugin. #3

Merged
merged 24 commits into from
Aug 12, 2023

Conversation

ctb
Copy link
Collaborator

@ctb ctb commented Oct 22, 2022

Add fastmultigather to plugin.

This PR refactors the code into more reusable chunks and adds fastmultigather, which will take in multiple queries and run gather upon them. The basic memory usage will increase from fastgather because the multigather code loads multiple metagenomes into memory, but in exchange we only load the database once (at the beginning) and then process multiple metagenomes with it. 🎉

Specifically, the multigather code does the following:

  • load all database sketches into memory;
  • then, in a threaded loop,
    • load each query metagenome
    • do a threaded prefetch to find matching sketches
    • follows that with a gather (threaded prefetch + pick best + subtract, repeat)

@ctb ctb marked this pull request as draft October 22, 2022 13:39
@ctb ctb marked this pull request as ready for review August 12, 2023 16:30
@ctb ctb changed the title Add threaded multigather to pymagsearch code. MRG: Add threaded multigather to pyo3_branchwater code; add fastmultigather CLI plugin. Aug 12, 2023
@ctb ctb merged commit 386b93c into main Aug 12, 2023
12 checks passed
@ctb ctb deleted the multigather branch August 12, 2023 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant