Skip to content

Commit

Permalink
v0.1.1
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Nov 10, 2016
1 parent c01e3dc commit 29f40a0
Show file tree
Hide file tree
Showing 19 changed files with 477 additions and 70 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,4 @@ benchmark/.Rhistory
taxonkit/binaries*
*names.dmp
*nodes.dmp
*.json
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# TaxonKit - Crossplatform and Efficient NCBI Taxonomy Toolkit
# TaxonKit - Cross-platform and Efficient NCBI Taxonomy Toolkit

**Documents:** [http://bioinf.shenwei.me/taxonkit](http://bioinf.shenwei.me/taxonkit)
([**Usage**](http://bioinf.shenwei.me/taxonkit/usage/))
([**Usage**](http://bioinf.shenwei.me/taxonkit/usage/),
[**Tutorial**](http://bioinf.shenwei.me/taxonkit/tutorial/))

**Source code:** [https://github.com/shenwei356/taxonkit](https://github.com/shenwei356/taxonkit)
[![GitHub stars](https://img.shields.io/github/stars/shenwei356/taxonkit.svg?style=social&label=Star&?maxAge=2592000)](https://github.com/shenwei356/taxonkit)
Expand All @@ -20,7 +21,7 @@

Go to [Download Page](http://bioinf.shenwei.me/taxonkit/download) for more download options and changelogs.

`taxonkit` is implemented in [Golang](https://golang.org/) programming language,
`TaxonKit` is implemented in [Go](https://golang.org/) programming language,
executable binary files **for most popular operating systems** are freely available
in [release](https://github.com/shenwei356/taxonkit/releases) page.

Expand Down
16 changes: 10 additions & 6 deletions doc/docs/download.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
# Download

`taxonkit` is implemented in [Golang](https://golang.org/) programming language,
`TaxonKit` is implemented in [Go](https://golang.org/) programming language,
executable binary files **for most popular operating system** are freely available
in [release](https://github.com/shenwei356/taxonkit/releases) page.

## Current Version

[taxonkit v0.1](https://github.com/shenwei356/taxonkit/releases/tag/v0.1)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.1/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.1)
[TaxonKit v0.1.1](https://github.com/shenwei356/taxonkit/releases/tag/v0.1.1)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.1.1/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.1.1)

- first release
- add feature of `taxonkit list`, users can choose output in readable JSON
format by flag `--json` so the taxonomy tree could be collapse and
uncollapse in modern text editor.

Links:

Expand All @@ -33,7 +35,7 @@ Links:

[Download Page](https://github.com/shenwei356/taxonkit/releases)

`taxonkit` is implemented in [Golang](https://golang.org/) programming language,
`TaxonKit` is implemented in [Go](https://golang.org/) programming language,
executable binary files **for most popular operating systems** are freely available
in [release](https://github.com/shenwei356/taxonkit/releases) page.

Expand Down Expand Up @@ -61,7 +63,9 @@ For Go developer, just one command:

## Previous Versions


- [TaxonKit v0.1](https://github.com/shenwei356/taxonkit/releases/tag/v0.1)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.1/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.1)
- first release


<div id="disqus_thread"></div>
Expand Down
Binary file removed doc/docs/files/grep_result.png
Binary file not shown.
6 changes: 0 additions & 6 deletions doc/docs/files/otu_table.csv

This file was deleted.

6 changes: 0 additions & 6 deletions doc/docs/files/otu_table.gAB.csv

This file was deleted.

7 changes: 0 additions & 7 deletions doc/docs/files/otu_table.gAB.t.csv

This file was deleted.

7 changes: 0 additions & 7 deletions doc/docs/files/otu_table.gAB.t.r.csv

This file was deleted.

7 changes: 0 additions & 7 deletions doc/docs/files/otu_table2.csv

This file was deleted.

7 changes: 0 additions & 7 deletions doc/docs/files/otu_table3.csv

This file was deleted.

Binary file added doc/docs/files/taxon.json.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions doc/docs/tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Tutorial

## Extract all sequences of certen taxons from the nr database

### Dataset

- [prot.accession2taxid.gz](ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz)

### Steps

Taking bacteria for example.

1. Getting all taxids of bacteria (taxid 2):

$ taxonkit list --nodes nodes.dmp --ids 2 --indent "" > bacteria.taxid.txt

It takes only 2.5s! Number of taxids:

$ wc -l bacteria.taxid.txt
454591 bacteria.taxid.txt

2. Extacting accessions with [csvtk](http://bioinf.shenwei.me/csvtk/download/):

$ csvtk -t grep -f taxid -P bacteria.taxid.txt prot.accession2taxid.gz | csvtk -t cut -f accession.version > bacteria.taxid.acc.txt

3. Extracting nr sequences:

$ blastdbcmd -db nr -entry all -outfmt "%a\t%T" | \
csvtk -t grep -f 2 -P bacteria.taxid.acc.txt | \
csvtk -t cut -f 1 | \
blastdbcmd -db nr -entry_batch - -out bacteria.fa

<div id="disqus_thread"></div>
<script>

/**
* RECOMMENDED CONFIGURATION VARIABLES: EDIT AND UNCOMMENT THE SECTION BELOW TO INSERT DYNAMIC VALUES FROM YOUR PLATFORM OR CMS.
* LEARN WHY DEFINING THESE VARIABLES IS IMPORTANT: https://disqus.com/admin/universalcode/#configuration-variables*/
/*
var disqus_config = function () {
this.page.url = PAGE_URL; // Replace PAGE_URL with your page's canonical URL variable
this.page.identifier = PAGE_IDENTIFIER; // Replace PAGE_IDENTIFIER with your page's unique identifier variable
};
*/
(function() { // DON'T EDIT BELOW THIS LINE
var d = document, s = d.createElement('script');
s.src = '//taxonkit.disqus.com/embed.js';
s.setAttribute('data-timestamp', +new Date());
(d.head || d.body).appendChild(s);
})();
</script>
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
100 changes: 90 additions & 10 deletions doc/docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Usage
```
TaxonKit - NCBI Taxonomy Toolkit
Version: 0.1
Version: 0.1.1
Author: Wei Shen <shenwei356@gmail.com>
Expand Down Expand Up @@ -45,40 +45,73 @@ Use "TaxonKit [command] --help" for more information about a command.
Usage

```
list taxon tree of given taxon IDs.
list taxon tree of given taxon IDs
Usage:
taxonkit list [flags]
Flags:
--ids string taxon ID(s), multiple IDs should be seperated by comma (default "1")
--indent string indent (default " ")
--json output in JSON format. you can save the result in file with suffix ".json" and open with modern text editor
--names string names.dmp file, when it given taxid will be followed by its scientific name
--nodes string nodes.dmp file (default "nodes.dmp")
--show-rank show rank of the node
```

Examples

1. Default usage

$ taxonkit list --nodes nodes.dmp --ids 9605
$ taxonkit list --nodes nodes.dmp --ids 9605,239934
9605
9606
63221
741158
1425170

1. Removing indent. The list could be used to extract sequences from BLAST database with `blastdbcmd`

$ taxonkit list --nodes nodes.dmp --ids 9605 --indent ""
239934
239935
349741
512293
512294
1131822
1262691
1263034
1131336
1574264
1574265
1638783
1679444
1755639
1896967

1. Removing indent. The list could be used to extract sequences from BLAST database with `blastdbcmd` (see [tutorial](http://bioinf.shenwei.me/taxonkit/tutorial/))

$ taxonkit list --nodes nodes.dmp --ids 9605,239934 --indent ""
9605
9606
63221
741158
1425170

239934
239935
349741
512293
512294
1131822
1262691
1263034
1131336
1574264
1574265
1638783
1679444
1755639
1896967


**Performance:** Time and memory usage for whole taxon tree:

$ # emptying the buffers cache
Expand All @@ -90,13 +123,29 @@ Examples

1. Adding names

$ taxonkit list --nodes nodes.dmp --names names.dmp --ids 9605
$ taxonkit list --nodes nodes.dmp --names names.dmp --ids 9605,239934
9605 [genus] Homo
9606 [species] Homo sapiens
63221 [subspecies] Homo sapiens neanderthalensis
741158 [subspecies] Homo sapiens ssp. Denisova
1425170 [species] Homo heidelbergensis

239934 [genus] Akkermansia
239935 [species] Akkermansia muciniphila
349741 [no rank] Akkermansia muciniphila ATCC BAA-835
512293 [no rank] environmental samples
512294 [species] uncultured Akkermansia sp.
1131822 [species] uncultured Akkermansia sp. SMG25
1262691 [species] Akkermansia sp. CAG:344
1263034 [species] Akkermansia muciniphila CAG:154
1131336 [species] Akkermansia sp. KLE1605
1574264 [species] Akkermansia sp. KLE1797
1574265 [species] Akkermansia sp. KLE1798
1638783 [species] Akkermansia sp. UNK.MGS-1
1679444 [species] Akkermansia glycaniphila
1755639 [species] Akkermansia sp. MC_55
1896967 [species] Akkermansia sp. 54_46

**Performance:** Time and memory usage for whole taxon tree:

$ # emptying the buffers cache
Expand All @@ -106,8 +155,39 @@ Examples
elapsed time: 9.825s
peak rss: 648.65 MB



1. Output in JSON format, so you can easily collapse and uncollapse taxonomy tree in modern text editor.

$ taxonkit list --nodes nodes.dmp --names names.dmp --ids 9605,239934 --json
{
"9605 [genus] Homo": {
"9606 [species] Homo sapiens": {
"63221 [subspecies] Homo sapiens neanderthalensis": {},
"741158 [subspecies] Homo sapiens ssp. Denisova": {}
}
"1425170 [species] Homo heidelbergensis": {}
},
"239934 [genus] Akkermansia": {
"239935 [species] Akkermansia muciniphila": {
"349741 [no rank] Akkermansia muciniphila ATCC BAA-835": {}
}
"512293 [no rank] environmental samples": {
"512294 [species] uncultured Akkermansia sp.": {},
"1131822 [species] uncultured Akkermansia sp. SMG25": {},
"1262691 [species] Akkermansia sp. CAG:344": {},
"1263034 [species] Akkermansia muciniphila CAG:154": {}
}
"1131336 [species] Akkermansia sp. KLE1605": {},
"1574264 [species] Akkermansia sp. KLE1797": {},
"1574265 [species] Akkermansia sp. KLE1798": {},
"1638783 [species] Akkermansia sp. UNK.MGS-1": {},
"1679444 [species] Akkermansia glycaniphila": {},
"1755639 [species] Akkermansia sp. MC_55": {},
"1896967 [species] Akkermansia sp. 54_46": {}
}
}

Snapshot of taxonomy (taxid 1) in kate:
![taxon.json.png](files/taxon.json.png)

<div id="disqus_thread"></div>
<script>
Expand Down
4 changes: 3 additions & 1 deletion doc/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ site_name: TaxonKit - NCBI Taxonomy Toolkit
pages:
- Home: index.md
- Download: download.md
- Usage: usage.md
- Documents:
- Usage: usage.md
- Tutorial: tutorial.md
- Links:
- Wei Shen's Bioinformatic tools: bioinf.md
theme: yeti
Expand Down
2 changes: 1 addition & 1 deletion doc/site
Submodule site updated from 30163e to eed57e
2 changes: 1 addition & 1 deletion taxonkit/cmd/helper.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ import (
)

// VERSION of csvtk
const VERSION = "0.1"
const VERSION = "0.1.1"

func checkError(err error) {
if err != nil {
Expand Down
Loading

0 comments on commit 29f40a0

Please sign in to comment.