Releases: shenwei356/taxonkit
Releases · shenwei356/taxonkit
TaxonKit v0.12.0-alpha
Changes
taxonkit create-taxdump
:- accepts arbitrary ranks #60
- better handle of taxa with same names.
- many flags changed.
TaxonKit v0.11.1
Changes
- TaxonKit v0.11.1
taxonkit create-taxdump
: fix bug of missing Class rank, contributed by @apcamargo. The flag--gtdb
was not effected. #57
TaxonKit v0.11.0
- TaxonKit v0.11.0
- new command
taxonkit create-taxdump
: Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB and ICTV. #56
- new command
v0.11.0-alpha
Changes
- new command
taxonkit create-taxdump
: Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB. #56
Usage:
Create NCBI-style taxdump files for custom taxonomy, e.g., GTDB
Input format:
0. For GTDB taxonomy file, just use --gtdb
1. The input file should be tab-delimited
2. At least one column is needed, please specify the filed index:
1) Kingdom/Superkingdom/Domain, -K/--field-kingdom
2) Phylum, -P/--field-phylum
3) Class, -C/--field-class
4) Order, -O/--field-order
5) Family, -F/--field-family
6) Genus, -G/--field-genus
7) Species (needed), -S/--field-species
8) Subspecies, -T/--field-subspecies
For GTDB, we use the assembly accession (without version number).
3. The column containing the genome/assembly accession is recommended to
generate TaxId mapping file (taxid.map, id -> taxid).
-A/--field-accession, field contaning genome/assembly accession
--field-accession-re, regular expression to extract the accession
Attentions:
1. Names should be distinct in taxa of different rank.
But for these missing some taxon nodes, using names of parent nodes is allowed:
GB_GCA_018897955.1 d__Archaea;p__EX4484-52;c__EX4484-52;o__EX4484-52;f__LFW-46;g__LFW-46;s__LFW-46 sp018897155
It can also detect duplicate names with different ranks, e.g.,
The Class and Genus have the same name B47-G6, and the Order and Family between them have different names.
In this case, we reassign a new TaxId by increasing the TaxId until it being distinct.
GB_GCA_003663585.1 d__Archaea;p__Thermoplasmatota;c__B47-G6;o__B47-G6B;f__47-G6;g__B47-G6;s__B47-G6 sp003663585
Usage:
taxonkit create-taxdump [flags]
Flags:
-A, --field-accession int field index of assembly accession (genome ID), for outputting taxid.map
--field-accession-re string regular expression to extract assembly accession (default
"^\\w\\w_(.+)$")
-C, --field-class int field index of class
-F, --field-family int field index of family
-G, --field-genus int field index of genus
-K, --field-kingdom int field index of kingdom
-O, --field-order int field index of order
-P, --field-phylum int field index of phylum
-S, --field-species int field index of species (needed)
-T, --field-subspecies int field index of subspecies
--force overwrite existed output directory
--gtdb input files are GTDB taxonomy file
--gtdb-re-subs string regular expression to extract assembly accession as the subspecies
(default "^\\w\\w_GC[AF]_(.+)\\.\\d+$")
-h, --help help for create-taxdump
--line-chunk-size int number of lines to process for each thread, and 4 threads is fast
enough. (default 5000)
--null strings null value of taxa (default [,NULL,NA])
-x, --old-taxdump-dir string taxdump directory of older version
--out-dir string output directory
--rank-names strings names of the 8 ranks, order maters (default
[superkingdom,phylum,class,order,family,genus,species,no rank])
TaxonKit v0.10.1
Changes
- TaxonKit v0.10.1
taxonkit cami2-filter
: fix option--show-rank
which did not work in v0.10.0.
TaxonKit v0.10.0
Changes
-
- new command
taxonkit cami2-filter
: Remove taxa of given TaxIds and their descendants in CAMI metagenomic profile taxonkit reformat
: fix panic for deleted taxid using-F/--fill-miss-rank
. #55
- new command
TaxonKit v0.9.0
Changes
- TaxonKit v0.9.0
- new command
taxonkit profile2cami
: converting metagenomic profile table to CAMI format
- new command
TaxonKit v0.8.0
Changes
TaxonKit v0.7.2
Changelog
- TaxonKit v0.7.2
taxonkit lineage
:- new flag
-R/--show-lineage-ranks
for appending ranks of all levels. - reduce memory occupation and slightly speedup.
- new flag
taxonkit filter
:- flag
-E/--equal-to
supports multiple values. - new flag
-n/--save-predictable-norank
: do not discard some special ranks without order when using -L, where rank of the closest higher node is still lower than rank cutoff.
- flag
taxonkit reformat
:- new placeholder
{t}
forsubspecies/strain
,{T}
forstrain
. Thanks @wqssf102 for feedback. - new flag
-S/--pseudo-strain
for using the node with lowest rank as strain name, only if which rank is lower than "species" and not "subpecies" nor "strain".
- new placeholder
TaxonKit v0.7.1
Changelog
- TaxonKit v0.7.1
taxonkit filter
:- disable unnecessary stdin check when using flag
--list-order
or--list-ranks
. #36 - better handling of black list, empty default value: "no rank" and "clade". And you need use
-N/--discard-noranks
to explicitly filter out "no rank", "clade". #37 - update help message. Thanks @standage for improve this command! #38
- disable unnecessary stdin check when using flag