fastkmers

A simple program for getting k-mer counts from a fastq/fasta file, written in Rust.

Description

This command line program takes a fastq/fasta file as input and outputs the counts of k-mers of a specified length. It is implemented using hash tables and a simple algortihm but is still reasonably fast (mostly by using parallel computation with the Rayon library). It can also be used to get per cycle base content for Illumina reads, by setting the k-mer size to the cycle count.

Install

I provide precompiled binaries for linux only here, but it is simple to compile and run:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
git clone https://github.com/angelovangel/fastkmers.git

cd fastkmers
cargo build --release

The executable file fastkmers is now under ./target/release/

Usage

# Make sure the executable is in your path
# check available options

fastkmers -h

# to get 4-mer counts and a summary
fastkmers -k 4 -s file.fastq.gz

# output json, input fasta
fastkmers -k 4 -j file.fasta

# stdin can also be used as input, use -
cat file.fasta | fastkmers -k 4 -j -

# query for a specific k-mer
fastkmers -k 5 -q "AATTG" file.fastq.gz

# query with regex is also supported
# this example would match all 5-mers whose last 4 bases are: not T| A | T or G | A
fastkmers -k 5 -q "[^T]A[T|G]A$" file.fastq.gz

# get base contents per cycle (the number of cycles has to be known beforehand)
fastkmers -k 126 -c tests/test.fasta

The k-mer counts are printed to stdout as a tab-separated table or as json.

Speed

I haven't compared to other programs (e.g. jellyfish), below are some measurements of the execution times for different k-mer sizes of the E. coli MG1655 genome, performed on a MacBook Pro 2018 (Intel i5 and 8 Gb RAM).

hyperfine -r 4 --warmup 1 --export-csv hyperfine-kmer-size.csv -P kmer 4 29 'fastkmers -k {kmer} -a mg1655.fasta'

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
img		img
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastkmers

Description

Install

Usage

Speed

About

Releases 4

Languages

angelovangel/fastkmers

Folders and files

Latest commit

History

Repository files navigation

fastkmers

Description

Install

Usage

Speed

About

Topics

Resources

Stars

Watchers

Forks

Releases 4

Languages