SBT search functions currently calculate containment #200

ctb · 2017-05-01T13:29:45Z

The SBT tree search is based on maximizing count_common, not on maximizing the MinHash similarity. This is now different from the behavior of sourmash search and MinHash.compare().

The text was updated successfully, but these errors were encountered:

ctb · 2017-05-19T13:20:46Z

in thinking about this --

in some cases, we simply want the match with the most shared hashes (which is what --best-only does).

in other cases, we want the best containment, which amounts to the same thing as above.

in the case where we want the best Jaccard similarity, the missing piece of info that we need is the cardinality (when searching trees with --scaled). I think we can get away with tracking the maximum cardinality of the nodes below each SBT node, which is something we can figure out from the number of hashes if we are using --scaled.

for this last case, it should be possible to construct a tree where right now we get the wrong answer. that would be a good test case... :)

question: what happens when we are searching trees that use num instead of scaled?

and, not to expand this issue too much, but we should also think about how to search SBTs when --track-abundance is on...

luizirber added the sbt label May 1, 2017

ctb mentioned this issue May 18, 2017

what's needed for a 2.0 release? #174

Closed

ctb mentioned this issue May 20, 2017

[WIP] Fix problem where tree search is truncated incorrectly. #244

Merged

5 tasks

luizirber closed this as completed in #244 Oct 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SBT search functions currently calculate containment #200

SBT search functions currently calculate containment #200

ctb commented May 1, 2017

ctb commented May 19, 2017

SBT search functions currently calculate containment #200

SBT search functions currently calculate containment #200

Comments

ctb commented May 1, 2017

ctb commented May 19, 2017