NOTE: Downloading the binaries will not help you to set up Cenote-Taker 2
. If you haven't already installed Cenote-Taker 2, please follow installation/update instructions in README, including the database updates.
Update notes:
- Major changes have been made to make the installation faster, easier and have a smaller data footprint (was ~130GB and now is ~8GB to ~75GB depending on your database choices). Details:
- The following tools (either tricky to install or out of date) were removed from the dependencies:
krona
,emboss suite
,circlator
,mummer
. - The following tools were added to the dependencies:
seqkit
- The following tools were changed from stand-alone git clones to packages in the conda environment:
lastal/lastdb
,hhblits/hhsearch
,phanotate
. - The protein BLAST database of RefSeq etc sequences was updated to include ~3000 new RefSeq virus entries
- The hhsuite databases are now optional. PDB, PFAM, CDD
- The tool now checks that your run_title is appropriately formatted
- For contigs with DTRs (direct terminal repeats), the
--wrap
option allows users to choose either: clip repeat region and rotate contig to an appropriate position, or forgo rotating and clipping but DTRs are reported in the genome map. #29 - Certain
rm
commands were fixed. #21 - The taxonomy calling framework has been updated. NCBI Taxdump files are used for TaxIDs instead of the krona database. "tax_guide.blastx.out" files now show the taxid of the best hit, and have tab-separated hierarchical taxonomy info for that reference. Example:
example_ct1_1 gi|849254117|ref|YP_009150201.1| terminase [Propionibacterium phage PHL085N00] 45.575 9.81e-119 452
taxid: 1500812
10239 Viruses superkingdom
2731341 Duplodnaviria clade
2731360 Heunggongvirae kingdom
2731618 Uroviricota phylum
2731619 Caudoviricetes class
28883 Caudovirales order
10699 Siphoviridae family
1982251 Pahexavirus genus
1982275 Pahexavirus PHL037M02 species
- protein sequence based taxonomy now is more flexible, with thresholds for genome taxon assignment:
Hallmark AAI to Reference | Taxonomic granularity from CT2 |
---|---|
>90% | Genus, e.g. "Ilzatvirus" |
>40% | Family, e.g. "Siphoviridae" |
>25% | Order, e.g. "Caudovirales" |
=<25% | Generic name, e.g. "phage" |
--hallmark_taxonomy
option allows users to get hierarchical taxonomy information for all identified hallmark genes. This could be useful for more sophisticated downstream taxonomy assignments.-db virion
is now the default setting. I think most people are inputting contigs assembled from WGS data, and this is the correct option for this data type.
Good luck with all of your Cenotes 💖
Mike