You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Discussion in #2178 reminded me of some things @ctb and I talked around a while back, and that seem like far less of a leap now.
With new selection and subsetting functionalities being increasingly fleshed out and useful:sig grep, tax grep, sig extract, tax extract, etc - we could generally enable a manifest-style file with metadata (METADATA.csv/sql?) for signatures and support (generating picklists for) subsetting across it.
Current use case example:
When we run MAGsearch, we postprocess the results to link matches with their SRA metadata. We could instead (or in addition) build a lineages-style sqldb for SRA runinfo metadata as a complementary manifest.
This would allow us to do:
metadata selection, e.g. "seawater metagenome" to enable SRA search/MAGsearch on just samples with that metadata. This could be really handy for times where we don't want to search the entire database -- assuming picklists make it into SRA search, I guess. It would be extra neat if metadata categories were hierarchical so that we could use extract to scale up, but afaik that's not how the info is organized, so this is more of a dream than a concrete use case.
As with the current functions, we use the metadata to select the identifiers we want, which we then use to select signatures for output/search/etc.
The most proximal use case is for MAGsearch, but I think could also be really useful for reference databases if there was additional metadata that would be useful to subset on -- e.g. quality, completeness, contamination, database source.
Ok this part is far less well-defined:
Thinking a bit about LIN groups and taxonomy that does not fit our current standard hierarchy. I wonder if we could allow these in the metadata file, with a corresponding json or similar that defines any (optional) hierarchical nature of the categories.
I guess the way I'm thinking about it is that taxonomy is a specific case, but metadata could be more flexible. @ctb there was a specific sort of tagging you suggested we could tie into when we talked about this (...last year??), but I can't remember the details.
The text was updated successfully, but these errors were encountered:
continuing that thought - sig grep seems like the places to do this, or perhaps something specific to manifests where we can link signature identifiers/names to generic metadata.
Discussion in #2178 reminded me of some things @ctb and I talked around a while back, and that seem like far less of a leap now.
With new selection and subsetting functionalities being increasingly fleshed out and useful:
sig grep, tax grep, sig extract, tax extract
, etc - we could generally enable a manifest-style file with metadata (METADATA.csv/sql
?) for signatures and support (generating picklists for) subsetting across it.Current use case example:
When we run MAGsearch, we postprocess the results to link matches with their SRA metadata. We could instead (or in addition) build a lineages-style sqldb for SRA runinfo metadata as a complementary manifest.
This would allow us to do:
tax annotate
-style annotation (or perhapsmetadata annotate
?)As with the current functions, we use the metadata to select the identifiers we want, which we then use to select signatures for output/search/etc.
The most proximal use case is for MAGsearch, but I think could also be really useful for reference databases if there was additional metadata that would be useful to subset on -- e.g. quality, completeness, contamination, database source.
Ok this part is far less well-defined:
Thinking a bit about LIN groups and taxonomy that does not fit our current standard hierarchy. I wonder if we could allow these in the metadata file, with a corresponding json or similar that defines any (optional) hierarchical nature of the categories.
I guess the way I'm thinking about it is that
taxonomy
is a specific case, butmetadata
could be more flexible. @ctb there was a specific sort of tagging you suggested we could tie into when we talked about this (...last year??), but I can't remember the details.The text was updated successfully, but these errors were encountered: