-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic usability problems with genbank SBT #716
Comments
One note is that 'sourmash gather -n ` could truncate the search at that point. Right now it does not - which makes running sbt gather on many files kind of annoying. (CTB note: fixed in #1042) |
Supporting direct compressed sbt.json archives (.zip or .tar.gz?) would be a big step. See #648. |
Biggest issue with #648 is that... it's kind of slow (but I think there are some ZIP tricks than can be played to make it faster). The size of the SBT is due to changes on internal node sizes, but we can revert that. Clustering the SBT will also help (#710 (comment)), because we can compress the internal nodes better. Oh, and I don't think we are compressing the internal nodes, which is also something that is relevant to #648 (stored nodes compressed inside the ZIP, and use the ZIP format just for single file distribution). |
ref #646 |
the remaining usability problems are mostly around NCBI taxonomy issues... see sourmash-bio/databases#8 |
This is going to be a bit of a meta issue, but it is also a separate UX issue.
With the genbank-d2 update, it's become quite hard to actually use the genbank SBT. A few issues -
sbt gather
.I know there are various things being worked on that could help with these issues, and I want to collect them here.
The text was updated successfully, but these errors were encountered: