clusteriser

Clusteriser divides input .fasta file into small groups that are more appropriate to analysis.

FASTA sequence alignment

timestamp-folder with FASTA alignments. One file per cluster. {fs|ws|we}_{cluster id} - full sized, ws - without start, we - without end.

-h | --help
-g | --gap - number of sequences which will be put into 'without beginning' or 'without end'. Need both parameters if used.
-p | --percent - if used, the previous parameter is used in percents. By default, uses number of chars.
-c | --coeff - similarity coeff. The minimal threshold for sequences to be in the same group.
-f | --file - path to the input FASTA alignment
-r | --register - if used, lowercase letters are considered to be gaps.
-a | --alphabet - which reduced alphabet to use:
- SE_B_14
- SE_B_10
- SE_V_10
- LI_A_10
- LI_B_10
- SOLIS_D_10
- SOLIS_G_10
- MURPHY_19
- SE_B_8
- SE_B_6
- DAYHOFF_6
- REGULAR (non-reduced)
- (groups, separated by comma, e.g.: QWER,SD,F. Case insensitive.). In case there is a bug in alphabet description, the REGULAR is used.

Only -f is an obligatory parameter.

java -jar clusteriser.jar -g 10 10 -c 0.5 -f testdata.fasta

After the program has finished the job, the result folder is created. Inside there is a folder with timestamp where the results are.

Provide feedback