-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
should we store signatures in SQLite databases? #1930
Comments
some early benchmarks - a 5k random subset of gtdbnumbersfile sizes: about a factor of 8 larger.
again zip:
against sqldb:
against sbt.zip:
|
Now that I've written all of the above out... ...a big selling point for Once implemented, |
preliminary implementation that actually seems to work just fine is here: #1933 🎉 |
If you care for my 2 c, I think having an sqlite index is a great idea. Besides performance, users can query it easily using SQL for unthought-of use cases. Would it be possible/ desirable to replace .zip indices, thereby removing the need to support both? |
good point in re flexibility of raw SQL queries! Although I hope that users don't try to figure out the scaled stuff themselves; I think I have a good handle on it but it's taken 5+ years :)
zip indices are parallelizable and also flexible in ways that SQLite indices are not; they're probably going to remain our default on-disk storage format. Plus they're a legacy format that we have to support anyway for a while. In general I'm just wary of introducing big new complicated pieces of code without clear justification of some sort :). I think the LCA database version of |
while this is something we should test more robustly, based on this stackoverflow post, "SQLite Concurrent Access" we should be fine for multithreaded/multiprocess reads from SQLite databases, and we could even be good for multithreaded writes if we configure things carefully, maybe using a write-ahead log. |
I'm OK with merging I don't think we should remove |
Agree on pulling With I think I also need to invest in better/more uniform tests. We have an awful lot of copy-pasta and it might be useful to build a uniform set of functional tests to apply to I'm mostly just struggling with where to cut off the SQLite PRs. If I do all the refactoring in one PR it's going to be a monster. |
Only somewhat random additional thought: the additional tests, as well as the detailed |
lca summarize is really nice and fast with the sqlite database implementation! #1958 |
In PR #1808, I'm nearing a decision point.
This PR is an experimental PR that adds
SqliteCollectionManifest
andSqliteIndex
, for storing manifests and signatures in a SQLite database.SqliteCollectionManifest
is indepenently useful and can be wrapped in aStandaloneManifestIndex
, whileSqliteIndex
builds onSqliteCollectionManifest
.My conundrum is that the manifest class is almost certainly worth merging, but I'm not sure about the index class. This issue is to track this discussion independently of the PR.
Questions:
Two big questions arise:
should we merge
SqliteIndex
?SqliteCollectionManifest
is really useful, but it's not clear to me thatSqliteIndex
should be merged.arguments for
SqliteIndex
:arguments against
SqliteIndex
:should we support scaled=1 in
SqliteIndex
?It adds a dependency, makes the code more complicated, and is not that fast. On the other hand it's kind of neat and it seems to work ok :).
The text was updated successfully, but these errors were encountered: