-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement lightweight SBT combining/adding for large SBTs #229
Comments
On top of And I actually think implement the three options are useful, they cover many different use cases. |
#240 adds the second and third options - you can now do |
In response to @meren,
We actually can do this in a few different ways —
the heaviest weight way right now is to combine or update the database, which is not that time/resource intensive but is still inconvenient. (The database can be updated mostly incrementally; it’s a Sequence Bloom Tree underneath). We have a command line way to do this with ‘sourmash sbt_combine’.
the medium weight way (mostly just frustrating) is to have sbt_gather output unknown bits of the signature. Then you could do iterative search (run sbt_gather on database A, take what remains, run
it on database B, etc.) There are many reasons to support it and it’s very easy so we will probably add it next time I need it myself.
the lightest weight way to do this is not yet supported but is an hour of hacking away - let the sbt_gather and sbt_search commands take multiple SBTs. The SBT search is very lightweight in terms of memory and resources (searching all of gen bank takes seconds and < 500 MB of RAM) and so simply doing 2x or 3x of them on multiple databases and then massaging the results is not difficult. But I am trying to be a bit careful about complexifying the command line so am hesitant to blindly add it. Easy to do once we need it, tho.
The text was updated successfully, but these errors were encountered: