Skip to content

Bioinformatics 101 tool for counting unique k-length substrings in DNA

License

Notifications You must be signed in to change notification settings

suchapalaver/krust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

krust: counts k-mers, written in rust

krust is a k-mer counter - a bioinformatics 101 tool for counting the frequency of substrings of length k within strings of DNA data. krust is written in Rust and run from the command line. It takes a FASTA file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has a reverse complement) and their frequency across all records in the given data. krust is tested for accuracy against jellyfish.

krust: counts k-mers, written in rust

Usage: krust <k> <path>

Arguments:
  <k>     provides k length, e.g. 5
  <path>  path to a FASTA file, e.g. /home/lisa/bio/cerevisiae.pan.fa

Options:
  -h, --help     Print help information
  -V, --version  Print version information

krust supports either rust-bio or needletail to read FASTA record. Use the --features flag to select.

Run krust with rust-bio's fasta reader to count 5-mers like this:

cargo run --release --features rust-bio -- 5 your/local/path/to/fasta_data.fa

or, searching for 21-mers with needletail as the fasta reader, like this:

cargo run --release --features needletail -- 21 your/local/path/to/fasta_data.fa

krust prints to stdout, writing, on alternate lines:

>114928
ATGCC
>289495
AATCA
...