-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sourmash for relatively short sequences? #1970
Comments
Hi! Absolutely, you can query a large database with short seqs but you might need to make the scale=1 |
Thanks for the fast reply! I did consider smaller scales but both sourmash search and sourmash gather's help documentation says this: --scaled FLOAT scaled value should be between 100 and 1e6 When I do sourmash sketch with -scaled=1 k=31, it completes but gives this warning: When I do sourmash gather... WARNING: scaled value should be >= 100. Continuing anyway. selecting specified query k=31 What does downsampling query from scaled=1 to 1 mean? Do I have to be concerned with these warning messages? Thanks again! |
Oh ok, at what scale did you build the database? |
The same scale settings for everything. It was scaled=1 |
Would you please retry with |
Hello, I've tried it now with --threshold-bp 1 and it's the same warning: sourmash gather representative_BGCs.id_TLL01_bin.12_c00123_opera_c...region001.fa.sig /home/ubuntu/volume4/iomics/antismash_mibig_db/mibig_all_bgcs.fa.sig --threshold-bp 1 --dna -k 31 --scaled 1 == This is sourmash version 4.3.0. == |
Yes, the warning would still appear, but did the results change to something meaningful? |
Hello, Yes the final result for that specific example makes sense now. I will use these parameters for the rest of my searches then. Thanks! |
@Chiamh Glad it worked! Let us know if you needed more support :) |
Replying to this: If you built a database with a scale=1000 and then queried with a signature built with a scale of 100, the message would say "downsampling query from scaled=100 to 1000". The query scale must be >= the database scale. |
(it will be released in sourmash v4.4.0) |
Hello,
Thanks for developing this tool! I know this isn't the intended use case, but is there a way to use sourmash search/gather for relatively short sequences (a few kb) against large databases? The sequences are so short that even scaled=100 seems inappropriate because there will be so few query hashes left.
The text was updated successfully, but these errors were encountered: