Skip to content

Commit

Permalink
#33. v0.8.1
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Jun 29, 2018
1 parent 0a30c8f commit 5d4eb05
Show file tree
Hide file tree
Showing 6 changed files with 361 additions and 132 deletions.
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,8 +238,12 @@ famous C lib [klib](https://github.com/attractivechaos/klib/) ([kseq.h](https://

![](https://github.com/shenwei356/bio/raw/master/benchmark/benchmark.tsv.png)

Seqkit calls `pigz` (much faster than `gzip`) or `gzip` to decompress .gz file if they are available.
So please **install [pigz](http://zlib.net/pigz/) to gain better parsing performance for gzipped data**.
<s>Seqkit calls `pigz` (much faster than `gzip`) or `gzip` to decompress .gz file if they are available.
So please **install [pigz](http://zlib.net/pigz/) to gain better parsing performance for gzipped data**.</s>
Seqkit does not call `pigz` or `gzip` any more since v0.8.1,
Because it does not always increase the speed.
But you can still utilize `pigz` or `gzip` by `pigz -d -c seqs.fq.gz | seqkit xxx`.

Seqkit uses package [pgzip](https://github.com/klauspost/pgzip) to write gzip file,
which is very fast (**10X of `gzip`, 4X of `pigz`**) and the gzip file would be slighty larger.

Expand Down
1 change: 0 additions & 1 deletion dev-version.md
Original file line number Diff line number Diff line change
@@ -1 +0,0 @@
- `seqkit subseq`: fix bug of missing quality when using `--gtf` or `--bed`
33 changes: 18 additions & 15 deletions doc/docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,12 @@ So please **install [pigz](http://zlib.net/pigz/) to gain better parsing perform

## Latest Version

[SeqKit v0.8.0](https://github.com/shenwei356/seqkit/releases/tag/v0.8.0)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/seqkit/v0.8.0/total.svg)](https://github.com/shenwei356/seqkit/releases/tag/v0.8.0)

- `seqkit`, **stricter FASTA/Q format requirement**, i.e., must starting with `>` or `@`.
- `seqkit`, *fix output format for FASTQ files containing zero-length records*, yes this [happens](https://github.com/lh3/seqtk/issues/109).
- `seqkit`, add amino acid code `O` (pyrrolysine) and `U` (selenocysteine).
- `seqkit replace`, *add flag `--nr-width` to fill leading 0s for `{nr}`*, useful for preparing sequence submission (">strain_00001 XX", ">strain_00002 XX").
- `seqkit subseq`, require BED file to be tab-delimited.
[SeqKit v0.8.1](https://github.com/shenwei356/seqkit/releases/tag/v0.8.1)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/seqkit/v0.8.1/total.svg)](https://github.com/shenwei356/seqkit/releases/tag/v0.8.1)

- `seqkit`: do not call `pigz` or `gzip` for decompressing gzipped file any more. But you can still utilize `pigz` or `gzip` by `pigz -d -c seqs.fq.gz | seqkit xxx`.
- `seqkit subseq`: fix bug of missing quality when using `--gtf` or `--bed`
- `seqkit stats`: parallelize counting files, it's much faster for lots of small files, especially for files on SSD

### Please cite

Expand All @@ -31,17 +28,16 @@ So please **install [pigz](http://zlib.net/pigz/) to gain better parsing perform

- run `seqkit version` to check update !!!
- run `seqkit genautocomplete` to update Bash completion !!!
- install [pigz](http://zlib.net/pigz/) to gain better parsing performance for gzipped data.


OS |Arch |File, 中国镜像 |Download Count
:------|:---------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Linux |32-bit |[seqkit_linux_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_linux_386.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_linux_386.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_linux_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_linux_386.tar.gz)
Linux |**64-bit**|[**seqkit_linux_amd64.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_linux_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_linux_amd64.tar.gz)
OS X |32-bit |[seqkit_darwin_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_darwin_386.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_darwin_386.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_darwin_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_darwin_386.tar.gz)
OS X |**64-bit**|[**seqkit_darwin_amd64.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_darwin_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_darwin_amd64.tar.gz)
Windows|32-bit |[seqkit_windows_386.exe.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_windows_386.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_windows_386.exe.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_windows_386.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_windows_386.exe.tar.gz)
Windows|**64-bit**|[**seqkit_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_windows_amd64.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.0/seqkit_windows_amd64.exe.tar.gz)
Linux |32-bit |[seqkit_linux_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_linux_386.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_linux_386.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_linux_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_linux_386.tar.gz)
Linux |**64-bit**|[**seqkit_linux_amd64.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_linux_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_linux_amd64.tar.gz)
OS X |32-bit |[seqkit_darwin_386.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_darwin_386.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_darwin_386.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_darwin_386.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_darwin_386.tar.gz)
OS X |**64-bit**|[**seqkit_darwin_amd64.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_darwin_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_darwin_amd64.tar.gz)
Windows|32-bit |[seqkit_windows_386.exe.tar.gz](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_windows_386.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_windows_386.exe.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_windows_386.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_windows_386.exe.tar.gz)
Windows|**64-bit**|[**seqkit_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_windows_amd64.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/seqkit/seqkit_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/seqkit/latest/seqkit_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/seqkit/releases/download/v0.8.1/seqkit_windows_amd64.exe.tar.gz)


## Installation
Expand Down Expand Up @@ -96,6 +92,13 @@ Howto:

## Release History

- [SeqKit v0.8.0](https://github.com/shenwei356/seqkit/releases/tag/v0.8.0)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/seqkit/v0.8.0/total.svg)](https://github.com/shenwei356/seqkit/releases/tag/v0.8.0)
- `seqkit`, **stricter FASTA/Q format requirement**, i.e., must starting with `>` or `@`.
- `seqkit`, *fix output format for FASTQ files containing zero-length records*, yes this [happens](https://github.com/lh3/seqtk/issues/109).
- `seqkit`, add amino acid code `O` (pyrrolysine) and `U` (selenocysteine).
- `seqkit replace`, *add flag `--nr-width` to fill leading 0s for `{nr}`*, useful for preparing sequence submission (">strain_00001 XX", ">strain_00002 XX").
- `seqkit subseq`, require BED file to be tab-delimited.
- [SeqKit v0.7.2](https://github.com/shenwei356/seqkit/releases/tag/v0.7.2)
[![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/seqkit/v0.7.2/total.svg)](https://github.com/shenwei356/seqkit/releases/tag/v0.7.2)
- `seqkit tab2fx`: fix a concurrency bug that occurs in low proprobability
Expand Down
41 changes: 35 additions & 6 deletions doc/docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,12 @@ famous C lib [klib](https://github.com/attractivechaos/klib/) ([kseq.h](https://

![](https://github.com/shenwei356/bio/raw/master/benchmark/benchmark.tsv.png)

Seqkit calls `pigz` (much faster than `gzip`) or `gzip` to decompress .gz file if they are available.
So please **install [pigz](http://zlib.net/pigz/) to gain better parsing performance for gzipped data**.
<s>Seqkit calls `pigz` (much faster than `gzip`) or `gzip` to decompress .gz file if they are available.
So please **install [pigz](http://zlib.net/pigz/) to gain better parsing performance for gzipped data**.</s>
Seqkit does not call `pigz` or `gzip` any more since v0.8.1,
Because it does not always increase the speed.
But you can still utilize `pigz` or `gzip` by `pigz -d -c seqs.fq.gz | seqkit xxx`.

Seqkit uses package [pgzip](https://github.com/klauspost/pgzip) to write gzip file,
which is very fast (**10X of `gzip`, 4X of `pigz`**) and the gzip file would be slighty larger.

Expand Down Expand Up @@ -157,16 +161,14 @@ reproduced in different environments with same random seed.
```
SeqKit -- a cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Version: 0.8.0
Version: 0.8.1
Author: Wei Shen <shenwei356@gmail.com>
Documents : http://bioinf.shenwei.me/seqkit
Source code: https://github.com/shenwei356/seqkit
Please cite: https://doi.org/10.1371/journal.pone.0163962
Suggestion : Install pigz to gain better parsing performance for gzipped data
Usage:
seqkit [command]
Expand Down Expand Up @@ -587,6 +589,10 @@ Usage
```
simple statistics of FASTA/Q files
Tips:
1. For lots of small files (especially on SDD), use big value of '-j' to
parallelize counting.
Usage:
seqkit stats [flags]
Expand All @@ -597,8 +603,8 @@ Flags:
-a, --all all statistics, including quartiles of seq length, sum_gap, N50
-G, --gap-letters string gap letters (default "- .")
-h, --help help for stats
-e, --skip-err skip error, only show warning message
-T, --tabular output in machine-friendly tabular format
```

Eexamples
Expand Down Expand Up @@ -662,6 +668,29 @@ Eexamples
reads_1.fq.gz FASTQ DNA 2,500 567,516 226 227 229 227 227 227 0 227
reads_2.fq.gz FASTQ DNA 2,500 560,002 223 224 225 224 224 224 0 224

1. Parallelize counting files, it's much faster for lots of small files, especially for files on SSD

seqkit stats -j 10 refseq/virual/*.fna.gz

1. Skip error

$ seqkit stats tests/*
[ERRO] tests/hairpin.fa.fai: fastx: invalid FASTA/Q format

$ seqkit stats tests/* -e
[WARN] tests/hairpin.fa.fai: fastx: invalid FASTA/Q format
[WARN] tests/hairpin.fa.seqkit.fai: fastx: invalid FASTA/Q format
[WARN] tests/miRNA.diff.gz: fastx: invalid FASTA/Q format
[WARN] tests/test.sh: fastx: invalid FASTA/Q format
file format type num_seqs sum_len min_len avg_len max_len
tests/contigs.fa FASTA DNA 9 54 2 6 10
tests/hairpin.fa FASTA RNA 28,645 2,949,871 39 103 2,354
tests/Illimina1.5.fq FASTQ DNA 1 100 100 100 100
tests/Illimina1.8.fq.gz FASTQ DNA 10,000 1,500,000 150 150 150
tests/hairpin.fa.gz FASTA RNA 28,645 2,949,871 39 103 2,354
tests/reads_1.fq.gz FASTQ DNA 2,500 567,516 226 227 229
tests/mature.fa.gz FASTA RNA 35,828 781,222 15 21.8 34
tests/reads_2.fq.gz FASTQ DNA 2,500 560,002 223 224 225

## faidx

Expand Down
2 changes: 0 additions & 2 deletions seqkit/cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,6 @@ Documents : http://bioinf.shenwei.me/seqkit
Source code: https://github.com/shenwei356/seqkit
Please cite: https://doi.org/10.1371/journal.pone.0163962
Suggestion : Install pigz to gain better parsing performance for gzipped data
`, VERSION),
}

Expand Down
Loading

0 comments on commit 5d4eb05

Please sign in to comment.