You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SBT tree search is based on maximizing count_common, not on maximizing the MinHash similarity. This is now different from the behavior of sourmash search and MinHash.compare().
The text was updated successfully, but these errors were encountered:
in some cases, we simply want the match with the most shared hashes (which is what --best-only does).
in other cases, we want the best containment, which amounts to the same thing as above.
in the case where we want the best Jaccard similarity, the missing piece of info that we need is the cardinality (when searching trees with --scaled). I think we can get away with tracking the maximum cardinality of the nodes below each SBT node, which is something we can figure out from the number of hashes if we are using --scaled.
for this last case, it should be possible to construct a tree where right now we get the wrong answer. that would be a good test case... :)
question: what happens when we are searching trees that use num instead of scaled?
and, not to expand this issue too much, but we should also think about how to search SBTs when --track-abundance is on...
The SBT tree search is based on maximizing
count_common
, not on maximizing the MinHash similarity. This is now different from the behavior ofsourmash search
andMinHash.compare()
.The text was updated successfully, but these errors were encountered: