-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FracMinHash containment to ANI conversion #1859
Comments
and here's some code:
It relies on having a column Outputs:
|
(here's the file for k=21 suitably modified for the code above) |
k31, scaled 1000 These are results from gtdb representatives vs. all of gtdb. The counts are a little misleading, as there are certainly some duplicates (sigA --> sigB and sigB --> sig A are currently counted independently, will fix later). This does not affect binning, since we're just binning by 0.5 containment increments. Note that k21 ANI values are closer to mapping-based ANI; k31 is a bit more sensitive. |
can / should this issue be closed? @bluegenes |
ref ANI estimation PR #1788
I've been using our forthcoming ANI utilities to estimate pairwise ANI between GTDB genomes. From these data, we can examine the average containment --> ANI relationship for a given kmer length. Note that since the number of unique k-mers in each comparison also impacts the ANI estimate, so we do not expect a single ANI value per containment value.
I've currently run family-level comparisons using k=21, scaled=1000. I'm using the average of the directional containment values ("average containment") to estimate ANI. Average containment for these comparisons ranges from 0-1, and ANI estimates range from 80%-100%.
@ctb suggested binning containment values so we can develop a feel for containment --> ANI. Here I've binned containment by 0.05 intervals (containment ranges from 0-1).
Note:
count
is the total number of pairwise genome comparisons in each bin.csv version of this table attached.
mean-containment-k21-to-ANI.csv
The text was updated successfully, but these errors were encountered: