Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent "taxonkit reformat" output for 446045 #35

Closed
4 tasks done
standage opened this issue Oct 30, 2020 · 3 comments
Closed
4 tasks done

Inconsistent "taxonkit reformat" output for 446045 #35

standage opened this issue Oct 30, 2020 · 3 comments
Labels

Comments

@standage
Copy link

Hi @shenwei356, I just upgraded to 0.6.1 and I found some unexpected behavior when querying the lineage for taxid 446045 (Drosophila serrata species complex). The full lineage from taxonkit lineage is consistent and correct, but the abbreviated lineage from taxonkit reformat is inconsistent. The final taxon in the abbreviated lineage switches between 7215 (the correct genus), 32281 (a subgenus), and 2081351 (a totally unrelated genus that coincidentally shares the same name).

$ for i in {1..6}; do echo 446045 | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / | taxonkit reformat --lineage-field 3 --show-lineage-taxids -d /; done
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;2081351;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;32281;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;2081351;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;

Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

  • describe the problem
  • provide a reproducible example
@shenwei356 shenwei356 added the bug label Oct 30, 2020
@standage
Copy link
Author

Ruh roh, I found another example. When formatting the lineage for 1973489, the penultimate taxid switches between 1386 (the correct genus) and 55087 (an insect genus of the same name).

$ for i in {1..6}; do echo 1973489 | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / | taxonkit reformat --lineage-field 3 --show-lineage-taxids -d /; done
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489

@shenwei356
Copy link
Owner

Thanks, I Will check it tomorrow.

@shenwei356
Copy link
Owner

Fixed. I mapping (name, parent-name) to taxID to distinguish names shared by different taxIDs. I used it to find the right rank but forgot to apply to taxid :(

for i in {1..6}; do \
    echo 446045 \
        | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / \
        | taxonkit reformat --lineage-field 3 --show-lineage-taxids -d / \
        | cut -f 1,7,8; 
done
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;

for i in {1..6}; do \
    echo 1973489 \
        | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / \
        | taxonkit reformat --lineage-field 3 --show-lineage-taxids -d / \
        | cut -f 1,7,8; 
done
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants