Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SBT search functions currently calculate containment #200

Closed
ctb opened this issue May 1, 2017 · 1 comment · Fixed by #244
Closed

SBT search functions currently calculate containment #200

ctb opened this issue May 1, 2017 · 1 comment · Fixed by #244
Labels

Comments

@ctb
Copy link
Contributor

ctb commented May 1, 2017

The SBT tree search is based on maximizing count_common, not on maximizing the MinHash similarity. This is now different from the behavior of sourmash search and MinHash.compare().

@ctb
Copy link
Contributor Author

ctb commented May 19, 2017

in thinking about this --

in some cases, we simply want the match with the most shared hashes (which is what --best-only does).

in other cases, we want the best containment, which amounts to the same thing as above.

in the case where we want the best Jaccard similarity, the missing piece of info that we need is the cardinality (when searching trees with --scaled). I think we can get away with tracking the maximum cardinality of the nodes below each SBT node, which is something we can figure out from the number of hashes if we are using --scaled.

for this last case, it should be possible to construct a tree where right now we get the wrong answer. that would be a good test case... :)

question: what happens when we are searching trees that use num instead of scaled?

and, not to expand this issue too much, but we should also think about how to search SBTs when --track-abundance is on...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants