Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] use and report ANI from tax genome summarization #2005

Merged
merged 14 commits into from
Jul 24, 2022
Merged

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Apr 27, 2022

(starting from #1788)

Add --ani-threshold to sourmash tax genome, to allow classification based on $cANI$ >= threshold, e.g. .95.

Thoughts/warnings:

  • our default thresholding uses containment, because ANI was not yet implemented. I've left containment as the default here, but this might be something we want to change in the future.
  • cANI has different patterns based on the k-mer size and alphabet (moltype) used. Thresholds should be selected accordingly.

@codecov
Copy link

codecov bot commented Apr 27, 2022

Codecov Report

Merging #2005 (984c5e6) into latest (1bc273d) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           latest    #2005      +/-   ##
==========================================
+ Coverage   84.31%   84.34%   +0.02%     
==========================================
  Files         130      130              
  Lines       15293    15320      +27     
  Branches     2167     2176       +9     
==========================================
+ Hits        12895    12922      +27     
  Misses       2095     2095              
  Partials      303      303              
Flag Coverage Δ
python 91.72% <100.00%> (+0.02%) ⬆️
rust 65.29% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/cli/utils.py 100.00% <100.00%> (ø)
src/sourmash/tax/__main__.py 87.98% <100.00%> (+0.41%) ⬆️
src/sourmash/tax/tax_utils.py 97.95% <100.00%> (+0.06%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@bluegenes
Copy link
Contributor Author

@ctb I think this is ready for review. Please suggest any additional tests or anything it may need!

@bluegenes bluegenes changed the title [WIP] use and report ANI from tax genome summarization? [MRG] use and report ANI from tax genome summarization? Jul 23, 2022
@bluegenes bluegenes changed the title [MRG] use and report ANI from tax genome summarization? [MRG] use and report ANI from tax genome summarization Jul 23, 2022
@bluegenes
Copy link
Contributor Author

realized I need to add some docs... will try to do asap :)

doc/command-line.md Outdated Show resolved Hide resolved
src/sourmash/cli/utils.py Show resolved Hide resolved
src/sourmash/tax/tax_utils.py Outdated Show resolved Hide resolved
src/sourmash/tax/tax_utils.py Outdated Show resolved Hide resolved
@ctb
Copy link
Contributor

ctb commented Jul 24, 2022

I did a thing, successfully!

The following:

sourmash gather podar-ref/63.fa.sig gtdb-rs207.genomic-reps.dna.k31.zip
sourmash tax genome --gather-csv 63.x.gtdb.csv --taxonomy-csv gtdb-rs207.taxonomy.csv --ani 0.9

yielded:

"NC_011663.1 Shewanella baltica OS223, complete genome",match,species,0.462,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Shewanellaceae;g__Shewanella;s__Shewanella baltica,5615e4d7,63.fa,0.462,2418000.0,0.9753728689340799

Copy link
Contributor

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

The docs build on my laptop just fine, so I am inclined to merge and then figure out the docs stuff separately. LMK if that's ok.

@ctb ctb merged commit f3a4b88 into latest Jul 24, 2022
@ctb ctb deleted the add-tax-ani branch July 24, 2022 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants