Skip to content
This repository has been archived by the owner on Jun 2, 2022. It is now read-only.

Latest commit

 

History

History
19 lines (15 loc) · 1.01 KB

README.md

File metadata and controls

19 lines (15 loc) · 1.01 KB

hcluster

Tests Passing

This library provides Python functions for hierarchical clustering. Its features include

  • generating hierarchical clusters from distance matrices
  • computing distance matrices from observation vectors
  • computing statistics on clusters
  • cutting linkages to generate flat clusters
  • and visualizing clusters with dendrograms. The interface is very similar to MATLAB's Statistics Toolbox API to make code easier to port from MATLAB to Python/Numpy. The core implementation of this library is in C for efficiency.

It is a fork of clustering and distance functions from the scipy that removes all the dependencies on scipy. It preserves the API of hcluster 0.2.

Part of the Dedupe.io cloud service and open source toolset for de-duplicating and finding fuzzy matches in your data.