-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A potential bug? search
not reporting matches w/identical md5s
#2805
Comments
I'll look into it! It should be reporting both matches. Thank you for providing so much info! |
search
not reporting matches w/identical md5s
Also found in #3284 by @agombolay! This is caused by the following code, which intentionally removes duplicate sketches from consideration: sourmash/src/sourmash/search.py Lines 685 to 691 in bc22970
As I wrote in #3284,
Note that the new This behavior exists throughout sourmash/src/sourmash/search.py Lines 735 to 739 in bc22970
If we want to fix it, the quickest fix would be to supply an additional flag to An alternative (or addition) would be to follow through on some parts of the refactoring suggested in #2002 Thoughts? My sense is that |
Hi @ctb,
I built a test database where I duplicated a genome but given two different names
48246288
and482462
:When I ran
to find the most similar genomes for 482462, I expect it would return both 48246288 and 482462but only 48246288:
However, when I ran
for
48246288
, it also returns48246288
only:Is it a bug? Or it is normal because these two genomes are the same and haves the same MD5 hash, so only one is reported?
The text was updated successfully, but these errors were encountered: