Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Reformat gather output to include taxonomy #384

Closed
brooksph opened this issue Jan 12, 2018 · 5 comments
Closed

Feature request: Reformat gather output to include taxonomy #384

brooksph opened this issue Jan 12, 2018 · 5 comments

Comments

@brooksph
Copy link
Contributor

Can we modify the gather output to separate matches into columns based on taxonomic rank? It's sometimes difficult to do this at the command line when the species or strain id spans multiple fields. For example, KQ235715.1 Fusobacterium nucleatum subsp. animalis D11 genomic scaffold adfWA-supercont2.1. I think @ctb may have already done this for lca.

@ctb
Copy link
Contributor

ctb commented Jan 12, 2018 via email

@brooksph
Copy link
Contributor Author

That's tricky but "undefined" might work. Not sure if there are standards but I'll look around.

@ctb
Copy link
Contributor

ctb commented Feb 3, 2018

Just revisiting this with some more thoughts -

gather works by finding the signature among the search subjects that best matches the hashes in the query, subtracting the matched hashes, and then repeating with the remaining hashes. The name output by gather comes from the name of each found signature. So to fix this we would have to update signatures to have taxonomic information (which is a big burden on the user - it's a reason I frequently don't use the lca search!)

But it might be possible to do the same 'gather' algorithm but with an lca database... so you'd have 'sourmash lca gather' that would output taxonomic info. Humm.

@ctb
Copy link
Contributor

ctb commented Feb 3, 2018

See #390; example output so far:

./sourmash lca gather ../lca-db//tully-genome-sigs/TOBG_IN-33.fna.gz.sig ../lca-db/delmont-MAGs-k31.lca.json.gz 
loaded 1 databases.
ksize=31 scaled=10000
0.8 Mbp    0.7%    Archaea; Euryarchaeota; ; ; ; ; 
110.0 kbp    0.1%    Eukaryota; Chlorophyta; Prasinophyceae; Mamiellales; Bathycoccaceae; Ostreococcus; 

25.8% (310.0 kbp) have no assignment.

@ctb ctb changed the title Feature request: Reformat gather output Feature request: Reformat gather output to include taxonomy Feb 19, 2018
@ctb
Copy link
Contributor

ctb commented Feb 24, 2018

Closed by #390.

@ctb ctb closed this as completed Feb 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants