-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallelizing SRA search via snakemake #1664
Comments
(current size of wort-sra directory: 9.4T) |
Nice! snakemake was the first version of mag search, so everything old is new again =] (by that time we didn't have good support like now for doing parallel searches, so curious to see where the greymake goes) |
😆
It's meant to be a code-lite proof of concept on top of other deeper infrastructure, just like 99% of everything I do :). Looking forward to making it more robust and performant by refactoring sourmash underneath! |
misc thought: we could pretty easily use the abspath manifests in #1891 to build sra_search siglist files, allowing picklists etc - and do this all in Snakemake.
This mostly provides a way to interconnect our management of all of these millions of files with the same underlying set of catalogs/manifests, which is nice, but not game changing. Although one nice feature would be to able to subselect a set of SRA records based on their metadata, as we are beginning to explore over in https://github.com/dib-lab/2022-sra-gather. |
Food for thought: snakemake supports Rust rules, so it can also be used to wire the Rust parts of https://github.com/sourmash-bio/sra_search while using all the other nice snakemake features too. |
TIL about |
This could be a nice thing to provide for sourmash more generally. Neat stuff! |
So I did a thing... https://github.com/ctb/2021-sourmash-greymake
Building off some conversations with @bluegenes, and inspired a bit by the work on manifests of manifests #1652, I roughed out a simple system for search parallelism in Python. Basically a (very) poor version of greyhound #1226 :).
The basic idea is:
mom-select-to-picklists.py
)Thoughts and questions:
The text was updated successfully, but these errors were encountered: