Convert a FASTA alignment to SNP distance matrix
% cat test/good.aln
>seq1
AGTCAGTC
>seq2
AGGCAGTC
>seq3
AGTGAGTA
>seq4
TGTTAGAC
% snp-dists test/good.aln > distances.tab
Read 4 sequences of length 8
% cat distances.tab
snp-dists 0.2 seq1 seq2 seq3 seq4
seq1 0 1 2 3
seq2 1 0 3 4
seq3 2 3 0 4
seq4 3 4 4 0
snp-dists
is written in C to the C99 standard and only depends on zlib
.
git clone https://github.com/tseemann/snp-dists.git
cd snp-dists/src
make
# optionally install to /usr/local/bin
make PREFIX=/usr/local install
brew install brewsci/bio/snp-dists
conda install -c bioconda -c conda-forge snp-dists
SYNOPSIS
Pairwise SNP distance matrix from a FASTA alignment
USAGE
snp-dists [options] alignment.fasta[.gz] > matrix.tsv
OPTIONS
-h Show this help
-v Print version and exit
-q Quiet mode; do not print progress information
-a Count all differences not just [AGTC]
-k Keep case, don't uppercase all letters
-c Output CSV instead of TSV
-b Blank top left corner cell instead of 'snp-dists 0.3'
URL
https://github.com/tseemann/snp-dists (Torsten Seemann)
Prints the name and version separated by a space in standard Unix fashion.
snp-dists 0.5
Don't print informational messages, only errors.
snp-dists 0.5,seq1,seq2,seq3,seq4
seq1,0,1,2,3
seq2,1,0,3,4
seq3,2,3,0,4
seq4,3,4,4,0
seq1 seq2 seq3 seq4
seq1 0 1 2 3
seq2 1 0 3 4
seq3 2 3 0 4
seq4 3 4 4 0
By default, all letters are (1) uppercased and (2) ignored if not A,G,T or C.
Normally one would not want to count ambiguous letters and gaps as a "difference" but if you desire, you can enable this option.
>seq1
NGTCAGTC
>seq2
AG-CAGTC
>seq3
AGTGNGTA
You may wish to preserve case, as you may wish lower-case characters to be masked in the comparison.
>seq1
AgTCAgTC
>seq2
AggCAgTC
>seq3
AgTgAgTA
Report bugs and give suggesions on the Issues page
- Disty by @karel-brinda
- Panito by @andrewjpage
- pairwise_snp_differences by @andergs