Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display ANI in search results? #2001

Open
ctb opened this issue Apr 27, 2022 · 4 comments
Open

display ANI in search results? #2001

ctb opened this issue Apr 27, 2022 · 4 comments

Comments

@ctb
Copy link
Contributor

ctb commented Apr 27, 2022

Once #1967 is merged, ANI will available in CSV files! 🎉

It is also available in sourmash compare matrix output, if --ani is used. 🎉

I don't think it is displayed anywhere else.

Do we want to add ANI to the search output, @bluegenes? I'm in favor - the search results are pretty sparse so I think we even have room for them.

Not sure about gather, though. I think the k-mer overlap approaches makes more sense, maybe? But it would be nice to have as an option, maybe? 🤔

Maybe do it for search first, since I'm pretty sure that's a good idea, and then a separate PR (no hurry) for gather?

@bluegenes
Copy link
Contributor

ANI to search output would be great, but a few thoughts:

  • we can't estimate ANI for num signatures, so num vs scaled outputs would be different (maybe they already are?)
  • search uses Jaccard by default, which isn't as good as containment for ANI. Does adding ANI to the search output encourage use of jaccard--> ANI?
  • How do we want to handle excessive jaccard error? Currently emit a warning, do we want to zero out?
  • Do we want to allow ANI estimation for abund searches (ANI ignores abundance)? If so, from jaccard?

@ctb
Copy link
Contributor Author

ctb commented Apr 27, 2022

we can't estimate ANI for num signatures, so num vs scaled outputs would be different (maybe they already are?)

ahh, I'd kinda forgotten that.

@bluegenes
Copy link
Contributor

ahh, I'd kinda forgotten that.

we could actually estimate ANI from num sketches using the Mash Distance, but I'd rather not, because:

  1. don't want to confuse folks / have them use the ANI's interchangeably
  2. want to encourage transition to FracMinHash

@ctb
Copy link
Contributor Author

ctb commented Apr 28, 2022

agreed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants