Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need exactly one. Specify --ksize or --dna/--protein. #317

Closed
mw55309 opened this issue Sep 6, 2017 · 13 comments
Closed

need exactly one. Specify --ksize or --dna/--protein. #317

mw55309 opened this issue Sep 6, 2017 · 13 comments

Comments

@mw55309
Copy link

mw55309 commented Sep 6, 2017

Hi

I am not sure exactly what this means.

I created a sig from reads using --scaled 10000 and -k 31

I then downloaded the k31 genbank database.

Now when I run:

 sourmash sbt_search genbank-k31.sbt.json AF-H4_S10_L001.sig -k 31

I get:

# running sourmash subcommand: sbt_search
When loading query from "AF-H4_S10_L001.sig"
2 signatures matching ksize and molecule type;
need exactly one. Specify --ksize or --dna/--protein.

What's happening? Same thing with sbt_gather

Perhaps as importantly, what command should I be running to compare reads from a metagenome against GenBank?

Cheers
Mick

@ctb
Copy link
Contributor

ctb commented Sep 6, 2017 via email

@ctb
Copy link
Contributor

ctb commented Sep 6, 2017 via email

@mw55309
Copy link
Author

mw55309 commented Sep 7, 2017

Hey Titus

Yeah we tried --dna and it didn't work

For the record, which installation instructions should we follow to get the latest stable version?

Cheers
Mick

@ctb
Copy link
Contributor

ctb commented Sep 7, 2017 via email

@ctb
Copy link
Contributor

ctb commented Sep 28, 2017

Closing: I think the issue here was a version issue and should be resolved by installing the latest version. Will be addressed by doing "unstable" releases (see #324), also.

@ctb ctb closed this as completed Sep 28, 2017
@nick-youngblut
Copy link

I seem to be getting this same error with with sourmash v2.0.0a2:

Command:

sourmash gather -k 31 -o tests/output_amy/sourmash/bins_DASTool_sourmash.csv tests/output_amy/sourmash/bins_DASTool.sig  /ebio/abt3_projects/databases/sourmash/genbank-k31.sbt.json > tests/output_amy/sourmash/bins_DASTool_sourmash.txt 	

Output:

...sig loading 5
When loading query from "tests/output_amy/sourmash/bins_DASTool.sig"
6 signatures matching ksize and molecule type;
need exactly one. Specify --ksize or --dna/--protein.

Adding --dna doesn't help. If I run sourmash compute on individual metagenome bins, then everything seems to work, but with the 6 metagenome bins used for input with sourmash compute in the example above, I get this error.

Any ideas on what's going wrong?

@ctb
Copy link
Contributor

ctb commented Feb 10, 2018

hi @nick-youngblut how was the bins_DASTool.sig computed? there's no simple way for sourmash to figure out which of the 6 signatures that are in there is the one to run gather on; for the moment the fix would be to put the signatures in individual files. Kind of a gap in the sourmash tool chain - eventually we'd like to be able to provide a signature name or md5 sum on the command line, but it's not there yet.

@nick-youngblut
Copy link

Thanks for the quick response! I computed bins_DASTool.sig with the following:

sourmash compute --scaled 10000 -k 31 -o tests/output_amy/sourmash/bins_DASTool.sig tests/output_amy/bin_DASTool_bins/*.fa 

The input was 6 fasta-formated metagenome bins.

@ctb
Copy link
Contributor

ctb commented Feb 10, 2018 via email

@nick-youngblut
Copy link

Great! Yeah, I'm trying to avoid multiple input, multiple output commands, which complicates snakemake pipelines.

I'll give the bin-file merging options a try. Will I still be able to differentiate the bins after merging, or will I just get a full list of taxonomic classifications for the entire merged set of bins?

@ctb
Copy link
Contributor

ctb commented Feb 10, 2018 via email

@sapuizait
Copy link

hi - i m having the exact same problem when trying to classify multiple contigs all in a single file
commands;
$ sourmash sketch dna -p scaled=1000,k=31 ~/tmp/shitflies/W22_Bacteria.filtered.fasta --singleton
$ sourmash search W22_Bacteria.filtered.fasta.sig gtdb-rs207.genomic.k31.lca.json.gz
$ sourmash search W22_Bacteria.filtered.fasta.sig gtdb-rs207.genomic.k31.lca.json.gz --dna

Output:
== This is sourmash version 4.7.0. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

select query k=31 automatically.
When loading query from 'W22_Bacteria.filtered.fasta.sig'
2311 signatures matching ksize and molecule type;
need exactly one. Specify --ksize or --dna, --rna, or --protein.

Is the solution to split in individual files and use gather? like mentioned above?
Thanks

@ctb
Copy link
Contributor

ctb commented Mar 6, 2023

yep! you can now use sourmash sig split to split many sketches into a directory full of files, or sourmash sig cat <input> -o somedir.d/ as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants