Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourmash for relatively short sequences? #1970

Closed
Chiamh opened this issue Apr 22, 2022 · 13 comments
Closed

Sourmash for relatively short sequences? #1970

Chiamh opened this issue Apr 22, 2022 · 13 comments

Comments

@Chiamh
Copy link

Chiamh commented Apr 22, 2022

Hello,

Thanks for developing this tool! I know this isn't the intended use case, but is there a way to use sourmash search/gather for relatively short sequences (a few kb) against large databases? The sequences are so short that even scaled=100 seems inappropriate because there will be so few query hashes left.

@mr-eyes
Copy link
Member

mr-eyes commented Apr 22, 2022

Hi!

Absolutely, you can query a large database with short seqs but you might need to make the scale=1

@Chiamh
Copy link
Author

Chiamh commented Apr 22, 2022

Thanks for the fast reply! I did consider smaller scales but both sourmash search and sourmash gather's help documentation says this:

--scaled FLOAT scaled value should be between 100 and 1e6

When I do sourmash sketch with -scaled=1 k=31, it completes but gives this warning:
WARNING: scaled value should be >= 100. Continuing anyway.

When I do sourmash gather...
sourmash gather short_sequence.fa.sig my_database.fa.sig --threshold-bp 2000 --dna -k 31 --scaled 1

WARNING: scaled value should be >= 100. Continuing anyway.
== This is sourmash version 4.3.0. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

selecting specified query k=31
loaded query: xxxx (k=31, DNA)
downsampling query from scaled=1 to 1
loaded 1923 signatures.

What does downsampling query from scaled=1 to 1 mean? Do I have to be concerned with these warning messages? Thanks again!

@mr-eyes
Copy link
Member

mr-eyes commented Apr 22, 2022

Oh ok, at what scale did you build the database?

@Chiamh
Copy link
Author

Chiamh commented Apr 22, 2022

The same scale settings for everything. It was scaled=1

@mr-eyes
Copy link
Member

mr-eyes commented Apr 22, 2022

Would you please retry with --threshold-bp 1 ?

@Chiamh
Copy link
Author

Chiamh commented Apr 22, 2022

Hello, I've tried it now with --threshold-bp 1 and it's the same warning:

sourmash gather representative_BGCs.id_TLL01_bin.12_c00123_opera_c...region001.fa.sig /home/ubuntu/volume4/iomics/antismash_mibig_db/mibig_all_bgcs.fa.sig --threshold-bp 1 --dna -k 31 --scaled 1
WARNING: scaled value should be >= 100. Continuing anyway.

== This is sourmash version 4.3.0. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==
selecting specified query k=31
loaded query: representative_BGCs.id_TLL01_b... (k=31, DNA)
downsampling query from scaled=1 to 1

@mr-eyes
Copy link
Member

mr-eyes commented Apr 22, 2022

Yes, the warning would still appear, but did the results change to something meaningful?

@Chiamh
Copy link
Author

Chiamh commented Apr 22, 2022

Hello,

Yes the final result for that specific example makes sense now. I will use these parameters for the rest of my searches then. Thanks!

@mr-eyes
Copy link
Member

mr-eyes commented Apr 22, 2022

@Chiamh Glad it worked! Let us know if you needed more support :)

@mr-eyes
Copy link
Member

mr-eyes commented Apr 22, 2022

What does downsampling query from scaled=1 to 1 mean? Do I have to be concerned with these warning messages? Thanks again!

Replying to this: If you built a database with a scale=1000 and then queried with a signature built with a scale of 100, the message would say "downsampling query from scaled=100 to 1000". The query scale must be >= the database scale.

@ctb
Copy link
Contributor

ctb commented Apr 22, 2022

@Chiamh I removed the unnecessary output here, #1971 - thanks for noting it!

@ctb
Copy link
Contributor

ctb commented Apr 22, 2022

(it will be released in sourmash v4.4.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants