Skip to content

Commit

Permalink
v0.1.4
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Nov 17, 2016
1 parent 713233d commit 0e5e4ef
Show file tree
Hide file tree
Showing 5 changed files with 164 additions and 34 deletions.
33 changes: 18 additions & 15 deletions doc/docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,29 +6,29 @@

## Current Version

[TaxonKit v0.1.3](https://github.com/shenwei356/taxonkit/releases/tag/v0.1.3)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.1.3/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.1.3)
[TaxonKit v0.1.4](https://github.com/shenwei356/taxonkit/releases/tag/v0.1.4)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.1.4/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.1.4)

- add command of `taxonkit reformat` which reformats full lineage to custom format
- add flag `--fill` for `taxonkit reformat`, which estimates and fills missing rank with original lineage information


Links:

- **Linux**
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_linux_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_linux_386.tar.gz)
[taxonkit_linux_386.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_linux_386.tar.gz)
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_linux_amd64.tar.gz)
[taxonkit_linux_amd64.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_linux_amd64.tar.gz)
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_linux_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_linux_386.tar.gz)
[taxonkit_linux_386.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_linux_386.tar.gz)
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_linux_amd64.tar.gz)
[taxonkit_linux_amd64.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_linux_amd64.tar.gz)
- **Mac OS X**
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_darwin_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_darwin_386.tar.gz)
[taxonkit_darwin_386.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_darwin_386.tar.gz)
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_darwin_amd64.tar.gz)
[taxonkit_darwin_amd64.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_darwin_amd64.tar.gz)
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_darwin_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_darwin_386.tar.gz)
[taxonkit_darwin_386.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_darwin_386.tar.gz)
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_darwin_amd64.tar.gz)
[taxonkit_darwin_amd64.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_darwin_amd64.tar.gz)
- **Windows**
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_windows_386.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_windows_386.exe.tar.gz)
[taxonkit_windows_386.exe.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_windows_386.exe.tar.gz)
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_windows_amd64.exe.tar.gz)
[taxonkit_windows_amd64.exe.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.3/taxonkit_windows_amd64.exe.tar.gz)
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_windows_386.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_windows_386.exe.tar.gz)
[taxonkit_windows_386.exe.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_windows_386.exe.tar.gz)
- [![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/taxonkit/latest/taxonkit_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_windows_amd64.exe.tar.gz)
[taxonkit_windows_amd64.exe.tar.gz](https://github.com/shenwei356/taxonkit/releases/download/v0.1.4/taxonkit_windows_amd64.exe.tar.gz)

## Installation

Expand Down Expand Up @@ -62,6 +62,9 @@ For Go developer, just one command:

## Previous Versions

- [TaxonKit v0.1.3](https://github.com/shenwei356/taxonkit/releases/tag/v0.1.3)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.1.3/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.1.3)
- add command of `taxonkit reformat` which reformats full lineage to custom format
- [TaxonKit v0.1.2](https://github.com/shenwei356/taxonkit/releases/tag/v0.1.2)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/taxonkit/v0.1.2/total.svg)](https://github.com/shenwei356/taxonkit/releases/tag/v0.1.2)
- add command of `taxonkit lineage`, users can query lineage of given taxon IDs from file
Expand Down
23 changes: 16 additions & 7 deletions doc/docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,8 +246,9 @@ Usage:
taxonkit reformat [flags]
Flags:
-b, --blank string blank string for missing level (default "__")
--blank string blank string for missing rank, if given "", "unclassified xxx" will used
-d, --delimiter string field delimiter in input lineage (default ";")
--fill estimate and fill missing rank with original lineage information (recommended)
-f, --format string output format, placeholder of is need (default "{k};{p};{c};{o};{f};{g};{s}")
--names string names.dmp file (default "names.dmp")
--nodes string nodes.dmp file (default "nodes.dmp")
Expand All @@ -267,19 +268,27 @@ Example lineage list:
1. Default output format ("{k};{p};{c};{o};{f};{g};{s}")

$ taxonkit reformat lineage.txt | cut -f 2
Bacteria;unclassified phylum;unclassified class;unclassified order;unclassified family;unclassified genus;uncultured murine large bowel bacterium BAC 54B
Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila
Viruses;unclassified phylum;unclassified class;Caudovirales;Siphoviridae;unclassified genus;Croceibacter phage P2559Y
Viruses;unclassified phylum;unclassified class;unclassified order;Retroviridae;Intracisternal A-particles;Mouse Intracisternal A-particle

1. Use custom strings for unclassfied ranks

$ ./taxonkit reformat lineage.txt --blank "__" | cut -f 2
Bacteria;__;__;__;__;__;uncultured murine large bowel bacterium BAC 54B
Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila
Viruses;__;__;Caudovirales;Siphoviridae;__;Croceibacter phage P2559Y
Viruses;__;__;__;Retroviridae;Intracisternal A-particles;Mouse Intracisternal A-particle

2. Extracting species

$ taxonkit reformat lineage.txt -f "{s}" | cut -f 2
uncultured murine large bowel bacterium BAC 54B
Akkermansia muciniphila
Croceibacter phage P2559Y
Mouse Intracisternal A-particle
1. Estimate and fill missing rank with original lineage information (**recommended**)

$ ./taxonkit reformat lineage.txt --fill | cut -f 2
Bacteria;environmental samples <Bacteria>;unclassified Bacteria class;unclassified Bacteria order;unclassified Bacteria family;unclassified Bacteria genus;uncultured murine large bowel bacterium BAC 54B
Bacteria;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila
Viruses;dsDNA viruses, no RNA stage;unclassified Viruses class;Caudovirales;Siphoviridae;unclassified Siphoviridae;Croceibacter phage P2559Y
Viruses;Retro-transcribing viruses;unclassified Viruses class;unclassified Viruses order;Retroviridae;Intracisternal A-particles;Mouse Intracisternal A-particle

<div id="disqus_thread"></div>
<script>
Expand Down
2 changes: 1 addition & 1 deletion doc/site
Submodule site updated from 1169dc to 9c2968
14 changes: 12 additions & 2 deletions taxonkit/cmd/helper.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ import (
)

// VERSION of csvtk
const VERSION = "0.1.3"
const VERSION = "0.1.4"

func checkError(err error) {
if err != nil {
Expand Down Expand Up @@ -299,8 +299,18 @@ var symbol2rank = map[string]string{
"s": "species",
"S": "subspecies",
}
var symbol2weight = map[string]float32{
"k": 1,
"p": 2,
"c": 3,
"o": 4,
"f": 5,
"g": 6,
"s": 7,
"S": 8,
}

var reRankPlaceHolder = regexp.MustCompile(`\{[^\}]\}`)
var reRankPlaceHolder = regexp.MustCompile(`\{(\w)\}`)

var reRankPlaceHolders = map[string]*regexp.Regexp{
"k": regexp.MustCompile(`\{k\}`),
Expand Down
126 changes: 117 additions & 9 deletions taxonkit/cmd/reformat.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,31 @@ Output format can be formated by flag --format, available placeholders:
format := getFlagString(cmd, "format")
delimiter := getFlagString(cmd, "delimiter")
blank := getFlagString(cmd, "blank")
fill := getFlagBool(cmd, "fill")

// check format
if !reRankPlaceHolder.MatchString(format) {
checkError(fmt.Errorf("placeholder of simplified rank not found in output format: %s", format))
}
matches := reRankPlaceHolder.FindAllStringSubmatch(format, -1)
outSranks := make(map[string]struct{})
outSranksList := []string{}
var currentWeight float32
var currentSymbol string
for _, match := range matches {
if weight, ok := symbol2weight[match[1]]; !ok {
checkError(fmt.Errorf("invalid placeholder: %s", match[0]))
} else {
if weight < currentWeight {
checkError(fmt.Errorf(`invalid placeholder order: {%s} {%s}. "%s" should be behind of "%s"`,
currentSymbol, match[1], symbol2rank[currentSymbol], symbol2rank[match[1]]))
}
outSranks[match[1]] = struct{}{}
outSranksList = append(outSranksList, match[1])
currentWeight = weight
currentSymbol = match[1]
}
}

files := getFileList(args)

Expand Down Expand Up @@ -108,29 +129,115 @@ Output format can be formated by flag --format, available placeholders:
return nil, false, nil
}

var rank, srank, name string
// names and weights
names2 := strings.Split(line, delimiter)
weights := make([]float32, len(names2))
var rank, srank string
var ok bool
srank2name := make(map[string]string)
for _, name := range strings.Split(line, delimiter) {
var currentWeight float32
for i, name := range names2 {
if name == "" {
continue
}
if rank, ok = name2rank[name]; ok && rank != norank {
rank, ok = name2rank[name]
if !ok { // unofficial name
currentWeight += 0.1
weights[i] = currentWeight
continue
}
if rank != norank {
if srank, ok = rank2symbol[rank]; ok {
srank2name[srank] = name
weights[i] = symbol2weight[srank]
currentWeight = weights[i]
} else {
log.Warningf("please contact author to add this rank to code: %s", rank)
}
} else {
currentWeight += 0.1
weights[i] = currentWeight
}
}

flineage := format
for srank, re := range reRankPlaceHolders {
if name, ok = srank2name[srank]; ok {
flineage = re.ReplaceAllString(flineage, name)
// preprare replacements.
// find the orphan names and missing ranks
replacements := make(map[string]string, len(matches))
// for _, match := range matches {
// if blank == "" {
// replacements[match[1]] = "unclassified " + symbol2rank[match[1]]
// } else {
// replacements[match[1]] = blank
// }
// }

orphans := make(map[string]float32)
orphansList := []string{}
existedSranks := make(map[string]struct{})
for i, name := range names2 {
if name == "" {
continue
}
if name2rank[name] == norank {
orphans[name] = weights[i]
orphansList = append(orphansList, name)
} else {
flineage = re.ReplaceAllString(flineage, blank)
if _, ok = outSranks[rank2symbol[name2rank[name]]]; ok { // to be outputed
replacements[rank2symbol[name2rank[name]]] = name
existedSranks[rank2symbol[name2rank[name]]] = struct{}{}
} else if name2rank[name] == "" {
orphans[name] = weights[i]
orphansList = append(orphansList, name)
}
}
}

if fill {
jj := -1
var hit bool
var lastRank string
for i, srank := range outSranksList {
if _, ok = existedSranks[srank]; ok {
lastRank = replacements[srank]
continue
}
hit = false
for j, name := range orphansList {
if j <= jj {
continue
}
if i == 0 {
if orphans[name] < symbol2weight[outSranksList[i]] {
hit = true
}
} else if i == len(outSranksList)-1 {

} else if orphans[name] > symbol2weight[outSranksList[i-1]] &&
orphans[name] < symbol2weight[outSranksList[i+1]] {
hit = true
}

if hit {
replacements[srank] = name
jj = j
break
}
}
if !hit {
if blank == "" {
replacements[srank] = fmt.Sprintf("unclassified %s %s", lastRank, symbol2rank[srank])
} else {
replacements[srank] = blank
}
}
}
}

flineage := format
for srank, re := range reRankPlaceHolders {
flineage = re.ReplaceAllString(flineage, replacements[srank])
}

return lineage2flineage{line, flineage}, true, nil
}

Expand Down Expand Up @@ -160,5 +267,6 @@ func init() {
flineageCmd.Flags().StringP("names", "", "names.dmp", "names.dmp file")
flineageCmd.Flags().StringP("format", "f", "{k};{p};{c};{o};{f};{g};{s}", "output format, placeholder of is need")
flineageCmd.Flags().StringP("delimiter", "d", ";", "field delimiter in input lineage")
flineageCmd.Flags().StringP("blank", "b", "__", "blank string for missing level")
flineageCmd.Flags().StringP("blank", "", "", `blank string for missing rank, if given "", "unclassified xxx" will used`)
flineageCmd.Flags().BoolP("fill", "", false, "estimate and fill missing rank with original lineage information (recommended)")
}

0 comments on commit 0e5e4ef

Please sign in to comment.