Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signature name multifasta input #132

Closed
phiweger opened this issue Feb 14, 2017 · 2 comments
Closed

signature name multifasta input #132

phiweger opened this issue Feb 14, 2017 · 2 comments
Labels

Comments

@phiweger
Copy link

When using multifasta, the --singleton option of sourmash compute is really useful. However, when one later queries each signature against an SBT, e.g. with sourmash categorize, the output format is:

filename, some_hit_id, similarity
filename, another_hit_id, similarity
...

Is there an option to carry the fasta header, so that it appears instead of the same filename againandagainandagain? I tried --name-from-first, but I think it was not meant for this purpose right?

Thx!

@phiweger
Copy link
Author

phiweger commented Feb 17, 2017

To make this more clear perhaps:

Given a multifasta

>FOO
ACTGATAC
>...
...

and after sourmash compute --singleton, sourmash categorize reports to stderr something like

loaded query: FOO ... (k=16, DNA)
for FOO, found: 0.9 BAR

However, the --csv flag will write the following information to disk:

multifasta.json,BAR,0.9

So what I do is parse the stderr, which is a bit inconvenient. This would be nice:

FOO,BAR,0.9

@ctb
Copy link
Contributor

ctb commented Feb 24, 2018

@phiweger could you take a look at #421 and see if it fixes this? Apologies to take so long!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants