Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taxon taxid reassigned with reformat #42

Closed
4 tasks done
standage opened this issue Apr 1, 2021 · 11 comments
Closed
4 tasks done

Taxon taxid reassigned with reformat #42

standage opened this issue Apr 1, 2021 · 11 comments

Comments

@standage
Copy link

standage commented Apr 1, 2021

Hello, I noticed some unexpected behavior today. When I query and reformat the lineage for taxid 2507530, taxonkit reformat re-assigns 2516889 as the taxid in the output (the last taxid in the line).

$ echo 2507530 \
    | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name --show-lineage-ranks \
    | taxonkit reformat --lineage-field 3 --show-lineage-taxids
2507530 2507530 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019    131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507530  Russula sp. 8 KA-2019   species no rank;superkingdom;clade;kingdom;subkingdom;phylum;subphylum;class;no rank;order;family;genus;no rank;species Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2516889
$ echo 2516889 \
    | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name --show-lineage-ranks \
    | taxonkit reformat --lineage-field 3 --show-lineage-taxids
2516889 2516889 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019    131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516889  Russula sp. 8 KA-2019   species no rank;superkingdom;clade;kingdom;subkingdom;phylum;subphylum;class;no rank;order;family;genus;no rank;species Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2516889

It looks like these may be duplicated, unmerged taxids.

$ grep -e 2507530 -e 2516889 ~/.taxonkit/names.dmp 
2507530 |       Russula sp. 8 KA-2019   |       Russula sp. 8 KA-2019 <NCBI:txid2507530>        |       scientific name |
2516889 |       Russula sp. 8 KA-2019   |       Russula sp. 8 KA-2019 <NCBI:txid2516889>        |       scientific name |
$ grep 2516889 ~/.taxonkit/merged.dmp
$

Obviously, we should hope NCBI fixes this in the taxdump soon. But I'm assuming this is not the intended taxonkit behavior?


Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

  • describe the problem
  • provide a reproducible example
@shenwei356
Copy link
Owner

It looks like these may be duplicated, unmerged taxid.

Yes, they are. They should be merged.

taxonkit reformat parses the complete lineages instead of reading TaxIds and querying lineage in real-time, in cases of the TaxIds are not available.
It retrieves TaxId of every taxon node by the combination of child and parent name for eliminating name ambiguity.

However, 2507530 and 2516889 have the exactly same lineage :( refromat would fail to distinguish them.

One solution is giving an option to specify the TaxId field for cases where TaxIds are available.
Meanwhile, cases of TaxIds with the same complete lineages should be detected while parsing taxdump files.

@shenwei356
Copy link
Owner

shenwei356 commented Apr 2, 2021

There are 52 more cases.

child,parent                                       taxid1,taxid2
------------------------------------------------   ----------------

Russula sp. 12 KA-2019, unclassified Russula       2507523, 2516885
Russula sp. 14 KA-2019, unclassified Russula       2507524, 2516886
Russula sp. 15 KA-2019, unclassified Russula       2516887, 2507525
Russula sp. 1 KA-2019, unclassified Russula        2516884, 2507521
Russula sp. 5 KA-2019, unclassified Russula        2516888, 2507527
Russula sp. 8 KA-2019, unclassified Russula        2516889, 2507530 
more cases
child,parent                                       taxid1,taxid2
------------------------------------------------   -----------------
Chiropsoides, Chiropsalmidae                       1105130, 2777044
clinical samples, environmental samples            88229, 191496
clinical samples, environmental samples            88229, 226901
environmental samples, Elusimicrobia               699875, 99260
environmental samples, Ichthyophonida              941404, 568718
environmental samples, Roseivirga                  543087, 927586
Listeria sp. FSL_L7-0091, unclassified Listeria    2718636, 2713500
Listeria sp. FSL_L7-0993, unclassified Listeria    2718628, 2713505
Listeria sp. FSL_L7-1447, unclassified Listeria    2718633, 2713603
Listeria sp. FSL_L7-1519, unclassified Listeria    2713502, 2718644
Listeria sp. FSL_L7-1582, unclassified Listeria    2718622, 2713504
Mansonella sp. CAM-9837, unclassified Mansonella   2697341, 2694888
Mansonella sp. CAM-9838, unclassified Mansonella   2697340, 2694887
Nemania aenea var. aureolutea, Nemania aenea       2779627, 109380
Penicillium citreoviride, Penicillium              1343377, 64494
Santalales incertae sedis, Santalales              2777525, 1649179
unclassified Acanthocephala, Acanthocephala        2685929, 1009550
unclassified Anisoptera, Anisoptera                1080974, 2685930
unclassified Antipatharia, Antipatharia            2750883, 44307
unclassified Bergia, Bergia                        2648616, 2727417
unclassified Cephalothrix, Cephalothrix            2664281, 2741702
unclassified Chlorella, Chlorella                  1962113, 2661577
unclassified Digenea, Digenea                      2685935, 99681
unclassified Diplolepis, Diplolepis                2677181, 2607940
unclassified Diplotaxis, Diplotaxis                2658736, 2677274
unclassified Diplura, Diplura                      2677275, 212010
unclassified Dracaena, Dracaena                    2292738, 2677199
unclassified Drosophila, Drosophila                58312, 1931990
unclassified Fridericia, Fridericia                2728542, 2604067
unclassified Giardia, Giardia                      1463203, 2770049
unclassified Gonatopus, Gonatopus                  2677302, 2659230
unclassified Hypoderma, Hypoderma                  2664351, 2677412
unclassified Hyssopus, Hyssopus                    2508054, 2714215
unclassified Inga, Inga                            2320256, 2659449
unclassified Kurzia, Kurzia                        2659477, 2677456
unclassified Liparis, Liparis                      2609094, 2200772
unclassified Myrmecia, Myrmecia                    2172497, 2677688
unclassified Nitrospira, Nitrospira                1704022, 2652172
unclassified Periploca, Periploca                  2677757, 2660233
unclassified Ponera, Ponera                        2608256, 2677547
unclassified Senegalia, Senegalia                  2696007, 2677834
unclassified Stellaria, Stellaria                  2596711, 2677902
unclassified Tetraspora, Tetraspora                2604509, 2677711
unclassified Trentepohlia, Trentepohlia            2137841, 2661401
unclassified Vertebrata, Vertebrata                2662825, 2202232
unclassified Yersinia, Yersinia                    2653513, 2677931
more details
1105130   genus      Chiropsoides                    cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Cubozoa;Chirodropida;Chiropsalmidae;Chiropsoides                                                                                                                                                                                                                                                                                                                      131567;2759;33154;33208;6072;6073;6137;655440;685045;1105130
2777044   genus      Chiropsoides                    cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Cubozoa;Chirodropida;Chiropsalmidae;Chiropsoides                                                                                                                                                                                                                                                                                                                      131567;2759;33154;33208;6072;6073;6137;655440;685045;2777044
2713500   species    Listeria sp. FSL_L7-0091        cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0091                                                                                                                                                                                                                                                                                         131567;2;1783272;1239;91061;1385;186820;1637;2642072;2713500
2718636   species    Listeria sp. FSL_L7-0091        cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0091                                                                                                                                                                                                                                                                                         131567;2;1783272;1239;91061;1385;186820;1637;2642072;2718636
2713505   species    Listeria sp. FSL_L7-0993        cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0993                                                                                                                                                                                                                                                                                         131567;2;1783272;1239;91061;1385;186820;1637;2642072;2713505
2718628   species    Listeria sp. FSL_L7-0993        cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0993                                                                                                                                                                                                                                                                                         131567;2;1783272;1239;91061;1385;186820;1637;2642072;2718628
2713603   species    Listeria sp. FSL_L7-1447        cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1447                                                                                                                                                                                                                                                                                         131567;2;1783272;1239;91061;1385;186820;1637;2642072;2713603
2718633   species    Listeria sp. FSL_L7-1447        cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1447                                                                                                                                                                                                                                                                                         131567;2;1783272;1239;91061;1385;186820;1637;2642072;2718633
2713502   species    Listeria sp. FSL_L7-1519        cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1519                                                                                                                                                                                                                                                                                         131567;2;1783272;1239;91061;1385;186820;1637;2642072;2713502
2718644   species    Listeria sp. FSL_L7-1519        cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1519                                                                                                                                                                                                                                                                                         131567;2;1783272;1239;91061;1385;186820;1637;2642072;2718644
2713504   species    Listeria sp. FSL_L7-1582        cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1582                                                                                                                                                                                                                                                                                         131567;2;1783272;1239;91061;1385;186820;1637;2642072;2713504
2718622   species    Listeria sp. FSL_L7-1582        cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1582                                                                                                                                                                                                                                                                                         131567;2;1783272;1239;91061;1385;186820;1637;2642072;2718622
2694888   species    Mansonella sp. CAM-9837         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Nematoda;Chromadorea;Rhabditida;Spirurina;Spiruromorpha;Filarioidea;Onchocercidae;Mansonella;unclassified Mansonella;Mansonella sp. CAM-9837                                                                                                                                                                                                   131567;2759;33154;33208;6072;33213;33317;1206794;6231;119089;6236;6274;2072716;6295;6296;42230;2647107;2694888
2694887   species    Mansonella sp. CAM-9838         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Nematoda;Chromadorea;Rhabditida;Spirurina;Spiruromorpha;Filarioidea;Onchocercidae;Mansonella;unclassified Mansonella;Mansonella sp. CAM-9838                                                                                                                                                                                                   131567;2759;33154;33208;6072;33213;33317;1206794;6231;119089;6236;6274;2072716;6295;6296;42230;2647107;2694887
2697341   species    Mansonella sp. Cam-9837         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Nematoda;Chromadorea;Rhabditida;Spirurina;Spiruromorpha;Filarioidea;Onchocercidae;Mansonella;unclassified Mansonella;Mansonella sp. Cam-9837                                                                                                                                                                                                   131567;2759;33154;33208;6072;33213;33317;1206794;6231;119089;6236;6274;2072716;6295;6296;42230;2647107;2697341
2697340   species    Mansonella sp. Cam-9838         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Nematoda;Chromadorea;Rhabditida;Spirurina;Spiruromorpha;Filarioidea;Onchocercidae;Mansonella;unclassified Mansonella;Mansonella sp. Cam-9838                                                                                                                                                                                                   131567;2759;33154;33208;6072;33213;33317;1206794;6231;119089;6236;6274;2072716;6295;6296;42230;2647107;2697340
109380    varietas   Nemania aenea var. aureolutea   cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;sordariomyceta;Sordariomycetes;Xylariomycetidae;Xylariales;Xylariaceae;Nemania;Nemania aenea;Nemania aenea var. aureolutea                                                                                                                                                                                                   131567;2759;33154;4751;451864;4890;716545;147538;716546;715989;147550;222545;37989;37990;109374;109375;109380
2779627   varietas   Nemania aenea var. aureolutea   cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;sordariomyceta;Sordariomycetes;Xylariomycetidae;Xylariales;Xylariaceae;Nemania;Nemania aenea;Nemania aenea var. aureolutea                                                                                                                                                                                                   131567;2759;33154;4751;451864;4890;716545;147538;716546;715989;147550;222545;37989;37990;109374;109375;2779627
64494     species    Penicillium citreoviride        cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;Eurotiomycetes;Eurotiomycetidae;Eurotiales;Aspergillaceae;Penicillium;Penicillium citreoviride                                                                                                                                                                                                                               131567;2759;33154;4751;451864;4890;716545;147538;716546;147545;451871;5042;1131492;5073;64494
1343377   species    Penicillium citreoviride        cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;Eurotiomycetes;Eurotiomycetidae;Eurotiales;Aspergillaceae;Penicillium;Penicillium citreoviride                                                                                                                                                                                                                               131567;2759;33154;4751;451864;4890;716545;147538;716546;147545;451871;5042;1131492;5073;1343377
2507521   species    Russula sp. 1 KA-2019           cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 1 KA-2019                                                                                                                                                                                                                               131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507521
2516884   species    Russula sp. 1 KA-2019           cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 1 KA-2019                                                                                                                                                                                                                               131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516884
2507527   species    Russula sp. 5 KA-2019           cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 5 KA-2019                                                                                                                                                                                                                               131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507527
2516888   species    Russula sp. 5 KA-2019           cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 5 KA-2019                                                                                                                                                                                                                               131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516888
2507530   species    Russula sp. 8 KA-2019           cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019                                                                                                                                                                                                                               131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507530
2516889   species    Russula sp. 8 KA-2019           cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019                                                                                                                                                                                                                               131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516889
2507523   species    Russula sp. 12 KA-2019          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 12 KA-2019                                                                                                                                                                                                                              131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507523
2516885   species    Russula sp. 12 KA-2019          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 12 KA-2019                                                                                                                                                                                                                              131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516885
2507524   species    Russula sp. 14 KA-2019          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 14 KA-2019                                                                                                                                                                                                                              131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507524
2516886   species    Russula sp. 14 KA-2019          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 14 KA-2019                                                                                                                                                                                                                              131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516886
2507525   species    Russula sp. 15 KA-2019          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 15 KA-2019                                                                                                                                                                                                                              131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507525
2516887   species    Russula sp. 15 KA-2019          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 15 KA-2019                                                                                                                                                                                                                              131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516887
1649179   no rank    Santalales incertae sedis       cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;Santalales;Santalales incertae sedis                                                                                                                                                                                                      131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;41947;1649179
2777525   no rank    Santalales incertae sedis       cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;Santalales;Santalales incertae sedis                                                                                                                                                                                                      131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;41947;2777525
88229     no rank    clinical samples                cellular organisms;Bacteria;PVC group;Chlamydiae;Chlamydiia;Chlamydiales;environmental samples;clinical samples                                                                                                                                                                                                                                                                                                                            131567;2;1783257;204428;204429;51291;95916;88229
88229     no rank    clinical samples                cellular organisms;Bacteria;PVC group;Chlamydiae;Chlamydiia;Chlamydiales;environmental samples;clinical samples                                                                                                                                                                                                                                                                                                                            131567;2;1783257;204428;204429;51291;95916;88229
191496    no rank    clinical samples                cellular organisms;Bacteria;PVC group;Chlamydiae;Chlamydiia;Parachlamydiales;Parachlamydiaceae;environmental samples;clinical samples                                                                                                                                                                                                                                                                                                      131567;2;1783257;204428;204429;1963360;92713;141644;191496
226901    no rank    clinical samples                cellular organisms;Bacteria;PVC group;Chlamydiae;Chlamydiia;Parachlamydiales;Parachlamydiaceae;Neochlamydia;environmental samples;clinical samples                                                                                                                                                                                                                                                                                         131567;2;1783257;204428;204429;1963360;92713;112987;212217;226901
99260     no rank    environmental samples           cellular organisms;Bacteria;Elusimicrobia;environmental samples                                                                                                                                                                                                                                                                                                                                                                            131567;2;74152;99260
543087    no rank    environmental samples           cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi group;Bacteroidetes;Cytophagia;Cytophagales;Roseivirgaceae;Roseivirga;environmental samples                                                                                                                                                                                                                                                                                   131567;2;1783270;68336;976;768503;768507;2762306;290180;543087
568718    no rank    environmental samples           cellular organisms;Eukaryota;Opisthokonta;Ichthyosporea;Ichthyophonida;environmental samples                                                                                                                                                                                                                                                                                                                                               131567;2759;33154;127916;198625;568718
699875    no rank    environmental samples           cellular organisms;Bacteria;Elusimicrobia;Elusimicrobia;environmental samples                                                                                                                                                                                                                                                                                                                                                              131567;2;74152;641853;699875
927586    no rank    environmental samples           cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi group;Bacteroidetes;Cytophagia;Cytophagales;Roseivirgaceae;Roseivirga;environmental samples                                                                                                                                                                                                                                                                                   131567;2;1783270;68336;976;768503;768507;2762306;290180;927586
941404    no rank    environmental samples           cellular organisms;Eukaryota;Opisthokonta;Ichthyosporea;Ichthyophonida;environmental samples                                                                                                                                                                                                                                                                                                                                               131567;2759;33154;127916;198625;941404
1009550   no rank    unclassified Acanthocephala     cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Acanthocephala;unclassified Acanthocephala                                                                                                                                                                                                                                                                                       131567;2759;33154;33208;6072;33213;33317;2697495;1206795;10232;1009550
2685929   no rank    unclassified Acanthocephala     cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Paraneoptera;Hemiptera;Prosorrhyncha;Heteroptera;Euheteroptera;Neoheteroptera;Panheteroptera;Pentatomomorpha;Coreoidea;Coreidae;Coreinae;Acanthocephala;unclassified Acanthocephala                                                           131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33342;7524;33343;33345;33347;33349;33351;33357;38105;186376;2068237;2316800;2685929
1080974   no rank    unclassified Anisoptera         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Palaeoptera;Odonata;Epiprocta;Anisoptera;unclassified Anisoptera                                                                                                                                                                                       131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33339;6961;2510002;6962;1080974
2685930   no rank    unclassified Anisoptera         cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;malvids;Malvales;Dipterocarpaceae;Dipterocarpoideae;Anisoptera;unclassified Anisoptera                                                                                                                                             131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71275;91836;41938;40588;65009;64577;2685930
44307     no rank    unclassified Antipatharia       cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Hexacorallia;Antipatharia;unclassified Antipatharia                                                                                                                                                                                                                                                                                                          131567;2759;33154;33208;6072;6073;6101;6102;44168;44307
2750883   no rank    unclassified Antipatharia       cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Hexacorallia;Antipatharia;unclassified Antipatharia                                                                                                                                                                                                                                                                                                          131567;2759;33154;33208;6072;6073;6101;6102;44168;2750883
2648616   no rank    unclassified Bergia             cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;fabids;Malpighiales;Elatinaceae;Bergia;unclassified Bergia                                                                                                                                                                         131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71275;91835;3646;125023;125024;2648616
2727417   no rank    unclassified Bergia             cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Hexacorallia;Zoantharia;Parazoanthidae;Bergia;unclassified Bergia                                                                                                                                                                                                                                                                                            131567;2759;33154;33208;6072;6073;6101;6102;44927;44928;2723760;2727417
2664281   no rank    unclassified Cephalothrix       cellular organisms;Bacteria;Terrabacteria group;Cyanobacteria/Melainabacteria group;Cyanobacteria;Oscillatoriophycideae;Oscillatoriales;Coleofasciculaceae;Cephalothrix;unclassified Cephalothrix                                                                                                                                                                                                                                          131567;2;1783272;1798711;1117;1301283;1150;1892251;1844514;2664281
2741702   no rank    unclassified Cephalothrix       cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Nemertea;Palaeonemertea;Cephalothricidae;Cephalothrix;unclassified Cephalothrix                                                                                                                                                                                                                                                  131567;2759;33154;33208;6072;33213;33317;2697495;1206795;6217;1684132;166040;166041;2741702
1962113   no rank    unclassified Chlorella          cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;core chlorophytes;Trebouxiophyceae;Chlorellales;Chlorellaceae;Chlorella clade;Chlorella;unclassified Chlorella                                                                                                                                                                                                                                                                      131567;2759;33090;3041;2692248;75966;35460;35461;2511126;3071;1962113
2661577   no rank    unclassified Chlorella          cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;core chlorophytes;Trebouxiophyceae;Trebouxiophyceae incertae sedis;Chlorella;unclassified Chlorella                                                                                                                                                                                                                                                                                 131567;2759;33090;3041;2692248;75966;75981;114055;2661577
99681     no rank    unclassified Digenea            cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Platyhelminthes;Trematoda;Digenea;unclassified Digenea                                                                                                                                                                                                                                                                           131567;2759;33154;33208;6072;33213;33317;2697495;1206795;6157;6178;6179;99681
2685935   no rank    unclassified Digenea            cellular organisms;Eukaryota;Rhodophyta;Florideophyceae;Rhodymeniophycidae;Ceramiales;Rhodomelaceae;Polysiphonioideae;Digenea;unclassified Digenea                                                                                                                                                                                                                                                                                         131567;2759;2763;2806;2045261;2802;2803;2008651;256429;2685935
2607940   no rank    unclassified Diplolepis         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Hymenoptera;Apocrita;Parasitoida;Cynipoidea;Cynipidae;Cynipinae;Diplolepidini;Diplolepis;unclassified Diplolepis                                                                                                                131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7399;7400;1955251;40307;73401;1159319;167046;73404;2607940
2677181   no rank    unclassified Diplolepis         cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;asterids;lamiids;Gentianales;Apocynaceae;Asclepiadoideae;Asclepiadeae;MOOG clade;Diplolepinae;Diplolepis;unclassified Diplolepis                                                                                                          131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71274;91888;4055;4056;167484;167488;2546561;1498481;274548;2677181
2658736   no rank    unclassified Diplotaxis         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Coleoptera;Polyphaga;Scarabaeiformia;Scarabaeoidea;Scarabaeidae;Melolonthinae;Diplotaxis;unclassified Diplotaxis                                                                                                                131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7041;41084;41086;75546;7055;7059;1710485;2658736
2677274   no rank    unclassified Diplotaxis         cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;malvids;Brassicales;Brassicaceae;Brassiceae;Diplotaxis;unclassified Diplotaxis                                                                                                                                                     131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71275;91836;3699;3700;981071;3731;2677274
212010    no rank    unclassified Diplura            cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Diplura;unclassified Diplura                                                                                                                                                                                                                                                        131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;29997;212010
2677275   no rank    unclassified Diplura            cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Chelicerata;Arachnida;Araneae;Mygalomorphae;Dipluridae;Diplura;unclassified Diplura                                                                                                                                                                                                                                   131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;6843;6854;6893;6894;88327;371957;2677275
2292738   no rank    unclassified Dracaena           cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;Liliopsida;Petrosaviidae;Asparagales;Asparagaceae;Nolinoideae;Dracaena;unclassified Dracaena                                                                                                                                                                                     131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;4447;1437197;73496;40552;703537;39502;2292738
2677199   no rank    unclassified Dracaena           cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Sauropsida;Sauria;Lepidosauria;Squamata;Bifurcata;Unidentata;Episquamata;Laterata;Teiioidea;Teiidae;Dracaena;unclassified Dracaena                                                                                             131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;8287;1338369;32523;32524;8457;32561;8504;8509;1329961;1329950;1329912;1329976;35036;8530;420544;2677199
58312     no rank    unclassified Drosophila         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Acalyptratae;Ephydroidea;Drosophilidae;Drosophilinae;Drosophilini;Drosophila;unclassified Drosophila                                                         131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43741;43746;7214;43845;46877;7215;58312
1931990   no rank    unclassified Drosophila         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Acalyptratae;Ephydroidea;Drosophilidae;Drosophilinae;Drosophilini;Drosophila;Drosophila;unclassified Drosophila                                              131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43741;43746;7214;43845;46877;7215;32281;1931990
2604067   no rank    unclassified Fridericia         cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;asterids;lamiids;Lamiales;Bignoniaceae;Bignonieae;Fridericia;unclassified Fridericia                                                                                                                                                      131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71274;91888;4143;24079;423302;354074;2604067
2728542   no rank    unclassified Fridericia         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Annelida;Clitellata;Oligochaeta;Enchytraeida;Enchytraeidae;Fridericia;unclassified Fridericia                                                                                                                                                                                                                                    131567;2759;33154;33208;6072;33213;33317;2697495;1206795;6340;42113;6381;1964463;6388;77730;2728542
1463203   no rank    unclassified Giardia            cellular organisms;Eukaryota;Metamonada;Fornicata;Diplomonadida;Hexamitidae;Giardiinae;Giardia;unclassified Giardia                                                                                                                                                                                                                                                                                                                        131567;2759;2611341;207245;5738;5739;68459;5740;1463203
2770049   no rank    unclassified Giardia            cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Mollusca;Gastropoda;Heterobranchia;Euthyneura;Panpulmonata;Eupulmonata;Stylommatophora;Helicina;Camaenoidea;Camaenidae;Giardia;unclassified Giardia                                                                                                                                                                              131567;2759;33154;33208;6072;33213;33317;2697495;1206795;6447;6448;216305;216307;977775;120490;6527;216366;87864;83226;2770048;2770049
2659230   no rank    unclassified Gonatopus          cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Hymenoptera;Apocrita;Aculeata;Chrysidoidea;Dryinidae;Gonatopodinae;Gonatopus;unclassified Gonatopus                                                                                                                             131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7399;7400;7434;40304;144390;2326770;216179;2659230
2677302   no rank    unclassified Gonatopus          cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;Liliopsida;Alismatales;Araceae;Philodendroideae;Zamioculcadeae;Gonatopus;unclassified Gonatopus                                                                                                                                                                                  131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;4447;16360;4454;421921;293485;175762;2677302
2664351   no rank    unclassified Hypoderma          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;sordariomyceta;Leotiomycetes;Rhytismatales;Rhytismataceae;Hypoderma;unclassified Hypoderma                                                                                                                                                                                                                                   131567;2759;33154;4751;451864;4890;716545;147538;716546;715989;147548;47166;47167;696359;2664351
2677412   no rank    unclassified Hypoderma          cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Calyptratae;Oestroidea;Oestridae;Hypodermatinae;Hypoderma;unclassified Hypoderma                                                                             131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43742;43755;7387;43915;7388;2677412
2508054   no rank    unclassified Hyssopus           cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;asterids;lamiids;Lamiales;Lamiaceae;Nepetoideae;Mentheae;Hyssopus;unclassified Hyssopus                                                                                                                                                   131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71274;91888;4143;4136;216706;216718;39168;2508054
2714215   no rank    unclassified Hyssopus           cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Hymenoptera;Apocrita;Parasitoida;Chalcidoidea;Eulophidae;Eulophinae;Hyssopus;unclassified Hyssopus                                                                                                                              131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7399;7400;1955251;7422;107755;150275;108394;2714215
2320256   no rank    unclassified Inga               cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;fabids;Fabales;Fabaceae;Caesalpinioideae;mimosoid clade;Ingeae;Inga;unclassified Inga                                                                                                                                              131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71275;91835;72025;3803;3804;3807;163486;162809;2320256
2659449   no rank    unclassified Inga               cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Amphiesmenoptera;Lepidoptera;Glossata;Neolepidoptera;Heteroneura;Ditrysia;Gelechioidea;Oecophoridae;Oecophorinae;Inga;unclassified Inga                                                                                         131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;85604;7088;41191;41196;41197;37567;37581;57992;116123;690231;2659449
2659477   no rank    unclassified Kurzia             cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Crustacea;Branchiopoda;Phyllopoda;Diplostraca;Cladocera;Anomopoda;Chydoridae;Kurzia;unclassified Kurzia                                                                                                                                                                                      131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6657;6658;116557;84337;6665;116561;77713;527153;2659477
2677456   no rank    unclassified Kurzia             cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Marchantiophyta;Jungermanniopsida;Jungermanniidae;Jungermanniales;Lophocoleineae;Lepidoziaceae;Lembidioideae;Kurzia;unclassified Kurzia                                                                                                                                                                                                                 131567;2759;33090;35493;131221;3193;3195;186771;186782;3199;3204;13806;1484581;428516;2677456
2200772   no rank    unclassified Liparis            cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;Liliopsida;Petrosaviidae;Asparagales;Orchidaceae;Epidendroideae;Malaxideae;Malaxidinae;Liparis;unclassified Liparis                                                                                                                                                              131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;4447;1437197;73496;4747;158332;158393;1759432;78793;2200772
2609094   no rank    unclassified Liparis            cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Actinopterygii;Actinopteri;Neopterygii;Teleostei;Osteoglossocephalai;Clupeocephala;Euteleosteomorpha;Neoteleostei;Eurypterygia;Ctenosquamata;Acanthomorphata;Euacanthomorphacea;Percomorphaceae;Eupercaria;Perciformes;Cottioidei;Cottales;Liparidae;Liparis;unclassified Liparis   131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;7898;186623;41665;32443;1489341;186625;1489388;123365;123366;123367;123368;123369;1489872;1489922;8111;8100;1490021;183715;183716;2609094
2172497   no rank    unclassified Myrmecia           cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;core chlorophytes;Trebouxiophyceae;Trebouxiales;Trebouxiaceae;Myrmecia;unclassified Myrmecia                                                                                                                                                                                                                                                                                        131567;2759;33090;3041;2692248;75966;2507901;2507902;114064;2172497
2677688   no rank    unclassified Myrmecia           cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Hymenoptera;Apocrita;Aculeata;Formicoidea;Formicidae;Myrmeciinae;Myrmeciini;Myrmecia;unclassified Myrmecia                                                                                                                      131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7399;7400;7434;2153479;36668;36669;232194;13617;2677688
1704022   no rank    unclassified Nitrospira         cellular organisms;Bacteria;Nitrospirae;Nitrospira;unclassified Nitrospira                                                                                                                                                                                                                                                                                                                                                                 131567;2;40117;203693;1704022
2652172   no rank    unclassified Nitrospira         cellular organisms;Bacteria;Nitrospirae;Nitrospira;Nitrospirales;Nitrospiraceae;Nitrospira;unclassified Nitrospira                                                                                                                                                                                                                                                                                                                         131567;2;40117;203693;189778;189779;1234;2652172
2660233   no rank    unclassified Periploca          cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Amphiesmenoptera;Lepidoptera;Glossata;Neolepidoptera;Heteroneura;Ditrysia;Gelechioidea;Cosmopterigidae;Chrysopeleiinae;Periploca;unclassified Periploca                                                                         131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;85604;7088;41191;41196;41197;37567;37581;173647;248747;347720;2660233
2677757   no rank    unclassified Periploca          cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;asterids;lamiids;Gentianales;Apocynaceae;Periplocoideae;Periploca;unclassified Periploca                                                                                                                                                  131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71274;91888;4055;4056;167485;63484;2677757
2608256   no rank    unclassified Ponera             cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Hymenoptera;Apocrita;Aculeata;Formicoidea;Formicidae;Ponerinae;Ponerini;Ponera;unclassified Ponera                                                                                                                              131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7399;7400;7434;2153479;36668;43085;141711;216406;2608256
2677547   no rank    unclassified Ponera             cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;Liliopsida;Petrosaviidae;Asparagales;Orchidaceae;Epidendroideae;Epidendreae;Ponerinae;Ponera;unclassified Ponera                                                                                                                                                                 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;4447;1437197;73496;4747;158332;158389;1005053;123181;2677547
2677834   no rank    unclassified Senegalia          cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Eubacteriales;Clostridiaceae;Senegalia;unclassified Senegalia                                                                                                                                                                                                                                                                                                        131567;2;1783272;1239;186801;186802;31979;1924097;2677834
2696007   no rank    unclassified Senegalia          cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;fabids;Fabales;Fabaceae;Caesalpinioideae;mimosoid clade;Acacieae;Senegalia;unclassified Senegalia                                                                                                                                  131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71275;91835;72025;3803;3804;3807;163485;468156;2696007
2596711   no rank    unclassified Stellaria          cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;Caryophyllales;Caryophyllaceae;Alsineae;Stellaria;unclassified Stellaria                                                                                                                                                                  131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;3524;3568;1141488;13273;2596711
2677902   no rank    unclassified Stellaria          cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Mollusca;Gastropoda;Caenogastropoda;Littorinimorpha;Xenophoroidea;Xenophoridae;Stellaria;unclassified Stellaria                                                                                                                                                                                                                  131567;2759;33154;33208;6072;33213;33317;2697495;1206795;6447;6448;69555;216294;159995;906789;1297112;2677902
2604509   no rank    unclassified Tetraspora         cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;core chlorophytes;Chlorophyceae;Tetrasporales;Tetrasporaceae;Tetraspora;unclassified Tetraspora                                                                                                                                                                                                                                                                                     131567;2759;33090;3041;2692248;3166;31305;35481;56012;2604509
2677711   no rank    unclassified Tetraspora         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Myxozoa;Myxosporea;Myxosporea incertae sedis;Tetraspora;unclassified Tetraspora                                                                                                                                                                                                                                                                                       131567;2759;33154;33208;6072;6073;35581;35582;1051104;148349;2677711
2137841   no rank    unclassified Trentepohlia       cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;Ulvophyceae;TCBD clade;Trentepohliales;Trentepohliaceae;Trentepohlia;unclassified Trentepohlia                                                                                                                                                                                                                                                                                      131567;2759;33090;3041;33103;2546214;35443;35445;173374;2137841
2661401   no rank    unclassified Trentepohlia       cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Nematocera;Tipulomorpha;Tipuloidea;Limoniidae;Limoniinae;Trentepohlia;unclassified Trentepohlia                                                                                                                         131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7148;43789;41829;43823;52737;2018059;2661401
2202232   no rank    unclassified Vertebrata         cellular organisms;Eukaryota;Rhodophyta;Florideophyceae;Rhodymeniophycidae;Ceramiales;Rhodomelaceae;Polysiphonioideae;Vertebrata;unclassified Vertebrata                                                                                                                                                                                                                                                                                   131567;2759;2763;2806;2045261;2802;2803;2008651;1261581;2202232
2662825   no rank    unclassified Vertebrata         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;unclassified Vertebrata                                                                                                                                                                                                                                                                                                   131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;2662825
2653513   no rank    unclassified Yersinia           cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Yersiniaceae;Yersinia;unclassified Yersinia                                                                                                                                                                                                                                                                                                                131567;2;1224;1236;91347;1903411;629;2653513
2677931   no rank    unclassified Yersinia           cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Polyneoptera;Dictyoptera;Mantodea;Mantidae;Amelinae;Yersinia;unclassified Yersinia                                                                                                                                                            131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33341;6970;7504;7505;267071;444888;2677931

@shenwei356
Copy link
Owner

shenwei356 commented Apr 2, 2021

One solution is giving an option to specify the TaxId field for cases where TaxIds are available.
Meanwhile, cases of TaxIds with the same complete lineages should be detected while parsing taxdump files.

Done.

Now, for these cases, warning messages are shown, and no data returns.
But you can use -a/--output-ambiguous-result to return one possible result, like the old version did.

echo 2507530 \
    | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name --show-lineage-ranks \
    | taxonkit reformat --lineage-field 3 --show-lineage-taxids
19:27:53.478 [WARN] we can't distinguish the TaxIds (2507530, 2516889) for lineage: cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019. But you can use -a/--output-ambiguous-result to return one possible result
2507530 2507530 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019     131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507530   Russula sp. 8 KA-2019   species no rank;superkingdom;clade;kingdo;subkingdom;phylum;subphylum;class;no rank;order;family;genus;no rank;species

echo 2507530 \
    | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name --show-lineage-ranks \
    | taxonkit reformat --lineage-field 3 --show-lineage-taxids -a
19:30:23.031 [WARN] we can't distinguish the TaxIds (2507530, 2516889) for lineage: cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019. But you can use -a/--output-ambiguous-result to return one possible result
2507530 2507530 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019     131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507530   Russula sp. 8 KA-2019   species no rank;superkingdom;clade;kingdom;subkingdom;phylum;subphylum;class;no rank;order;family;genus;no rank;species  Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019      2759;5204;155619;452342;5401;5402;2507530

If TaxIds are available, use -I/--taxid-field to tell the filed of TaxIds. 🍾

$ echo -ne "2507530\n2516889\n" | TAXONKIT_DB=. taxonkit reformat -I 1 -t
2507530 Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2507530
2516889 Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2516889

@standage
Copy link
Author

standage commented Apr 2, 2021

Tremendous. Thank you!

@standage
Copy link
Author

standage commented Apr 2, 2021

By the way, I submitted Russula sp. 12 KA-2019, unclassified Russula 2507523, 2516885 to the NCBI help desk yesterday, before your response. Maybe we should just point them to this thread for all the others. 😀

@shenwei356
Copy link
Owner

Hi @standage , any responce from NCBI?

Do you have any other issues while using or suggestions? I'd like to release a new version with this improved reformat.

@standage
Copy link
Author

standage commented Apr 8, 2021

I haven't had any other issues, thanks!

NCBI responded with the following.

Thank you very much for the notice. We have merged several such erroneous duplicates.

I didn't point them to this thread, I only mentioned Russula sp. 12 KA-2019, unclassified Russula 2507523, 2516885 in my ticket, and I haven't checked whether the latest update fixes the cases you found. So I'm not sure what the status is.

@shenwei356
Copy link
Owner

I check the latest taxdump files, some were merged while some not.

09:29:49.752 [WARN] taxid 2516885 was merged into 2507523
09:29:49.752 [WARN] taxid 2516886 was merged into 2507524
09:29:49.752 [WARN] taxid 2516887 was merged into 2507525
09:29:49.752 [WARN] taxid 2516884 was merged into 2507521
09:29:49.752 [WARN] taxid 2516888 was merged into 2507527
09:29:49.752 [WARN] taxid 2516889 was merged into 2507530

$ echo -ne "1105130\n2718636"  | TAXONKIT_DB=. taxonkit lineage |  TAXONKIT_DB=. taxonkit reformat -t
[09:31:27.603 [WARN] we can't distinguish the TaxIds (1105130, 2777044) for lineage: cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Cubozoa;Chirodropida;Chiropsalmidae;Chiropsoides. But you can use -a/--output-ambiguous-result to return one possible result
09:31:27.603 [WARN] we can't distinguish the TaxIds (2713500, 2718636) for lineage: cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0091. But you can use -a/--output-ambiguous-result to return one possible result
1105130 cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Cubozoa;Chirodropida;Chiropsalmidae;Chiropsoides
2718636 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0091

@Username-felix-is-not-available

@shenwei356 First of all, thank you very much for creating this great tool! It has been very helpful in my research.

If I understood correctly, the warning should only appear, if two lineages are completely identical. However, I also get this warning for two species with the same name and a different lineage. I am using taxonkit 0.80 and the taxdump downloaded today.

echo -ne "46515\n" | taxonkit lineage | taxonkit reformat

produces

[WARN] we can't distinguish the TaxIds (46515, 1276929)

But the lineages of the two taxa are not identical:
46515 cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Echinodermata;Eleutherozoa;Asterozoa;Asteroidea;Valvatacea;Valvatida;Asterinidae;Asterina;Asterina gibbosa

1276929 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;dothideomyceta;Dothideomycetes;Dothideomycetes incertae sedis;Asterinales;Asterinaceae;Asterina;Asterina gibbosa

Is this expected behavior?
Have a nice day,
Felix

@shenwei356
Copy link
Owner

By default, taxonkit reformat find the taxid from the taxon name and name of its parent taxon. Here, it's "Asterina;Asterina gibbosa".

If TaxIds are available, use -I/--taxid-field to tell the filed of TaxIds. 🍾

$ echo -ne "2507530\n2516889\n" | TAXONKIT_DB=. taxonkit reformat -I 1 -t
2507530 Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2507530
2516889 Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2516889

@Username-felix-is-not-available

Thank you for your swift reply! That makes sense. Actually, I wasn't aware of that option, but it makes life easier for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants