Sourmash ANI estimate in some cases does not match manual computation, although using the same sketch signature #1

mahmudhera · 2022-06-01T21:02:58Z

Running the script python main.py inputs/ecoli.fasta --seed 0 --scalef 0.01 produces the results in the file ani_comparison_results. The results show a disagreement between manual calculation of the point estimate, and the sourmash estimate when the true ANI is <= 66%.

The text was updated successfully, but these errors were encountered:

mahmudhera · 2022-06-01T21:09:05Z

@bluegenes is this because of some checks of the sketch size or the likelihood of corner cases? Please note that I do not use the latest branch of sourmash, rather compute the ANI using sourmash compare --containment --estimate-ani (please take a look at this), and then take an average of the two ANI values.
@dkoslicki these may be of interest to you.

bluegenes · 2022-06-02T00:11:27Z

Hi @mahmudhera,

The latest sourmash addresses some issues with thresholding from your original equations that were affecting results (see sourmash-bio/sourmash#2060). While I would recommend switching to latest to take advantage of this fix, I don't think that's the crux of the issue here.

I think the issue you're seeing is related to sourmash-bio/sourmash#2003, where we zero out the ANI when the sketch size estimation may be inaccurate. I've been noticing the same thing that you're seeing here -- this is happening quite often (see sourmash-bio/sourmash#2058 to see the original verbose output from these checks).

Are we being too strict with size accuracy estimation checks?
https://github.com/sourmash-bio/sourmash/blob/latest/src/sourmash/minhash.py#L942-L956

mahmudhera · 2022-06-07T04:52:06Z

Hi @bluegenes, I have rerun the script with the latest branch that I installed a few days ago for Phylo-ani. Strangely, there is now a perfect agreement between our estimate and the sourmash estimate of ANI (see here). I am currently using the version 4.4.1.dev3+g99c3997 now, which uses 0.95 and 0.05 as parameters in the cardinality estimation function.

Therefore, just for the discrepancies in this repository (which can be seen here), I believe the issue is the hardcoded thresholds, not the size estimation being too stringent).

mahmudhera closed this as completed Jun 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sourmash ANI estimate in some cases does not match manual computation, although using the same sketch signature #1

Sourmash ANI estimate in some cases does not match manual computation, although using the same sketch signature #1

mahmudhera commented Jun 1, 2022 •

edited

Loading

mahmudhera commented Jun 1, 2022

bluegenes commented Jun 2, 2022 •

edited

Loading

mahmudhera commented Jun 7, 2022 •

edited

Loading

Sourmash ANI estimate in some cases does not match manual computation, although using the same sketch signature #1

Sourmash ANI estimate in some cases does not match manual computation, although using the same sketch signature #1

Comments

mahmudhera commented Jun 1, 2022 • edited Loading

mahmudhera commented Jun 1, 2022

bluegenes commented Jun 2, 2022 • edited Loading

mahmudhera commented Jun 7, 2022 • edited Loading

mahmudhera commented Jun 1, 2022 •

edited

Loading

bluegenes commented Jun 2, 2022 •

edited

Loading

mahmudhera commented Jun 7, 2022 •

edited

Loading