An implementation of hierarchical clustering.
Specifying attribute information (<attribute file>
), data file (<data file>
) and linkage method (<linkage>
),
run the following command.
ruby clustering.rb -a <attribute file> -i <data file> -l <linkage> -g <ignored attributes>
where <ignore attributes>
is an array of attributes ignored in clustering.
The clustering results are saved in a_dir/
directory.
We have 3 linkage methods.
- single_linkage
- complete_linkage
- average_linkage
Next, we obtain a PGF/TikZ script of the dendrogram from the cophenetic matrix (a_dir/cophenetic
).
ruby coph_to_tikz.rb -c a_dir/cophenetic
A default output is dendro_tikz
.
Copy the output in a figure environment of an LaTeX file.
We need \usepackage{tikz}
in the preamble of the LaTeX file.
For iris.data
, we set
<attribute file> := datasets/iris.attr
<data file> := datasets/iris.data
<ignored attributes> := class
Run, the following.
ruby clustering.rb -a datasets/iris.attr -i datasets/iris.data -l average_linkage -g class
ruby coph_to_tikz.rb -c a_dir/cophenetic
We obtain a dendrogram.