Hierarchical Manifold Approximation and Projection (HUMAP) is a technique based on UMAP for hierarchical dimensionality reduction. HUMAP allows to:
- Focus on important information while reducing the visual burden when exploring huge datasets;
- Drill-down the hierarchy according to information demand.
The details of the algorithm can be found in our paper on ArXiv. This repository also features a C++ UMAP implementation.
HUMAP was written in C++ for performance purposes, and provides an intuitive Python interface. It depends upon common machine learning libraries, such as scikit-learn
and NumPy
. It also needs the pybind11
due to the interface between C++ and Python.
Requirements:
- Python 3.6 or greater
- numpy
- scipy
- scikit-learn
- pybind11
- pynndescent (for reproducible results)
- Eigen (C++)
If you have these requirements installed, use PyPI:
pip install humap
Alternatively (and preferable), you can use conda to install:
conda install humap
If using pip:
HUMAP depends on Eigen. Thus, make it sure to place the headers in /usr/local/include if using Unix or C:\Eigen if using Windows.
Manual installation:
For manually installing HUMAP, download the project and proceed as follows:
python setup.py bdist_wheel
pip install dist/humap*.whl
The simplest usage of HUMAP is as it follows:
Fitting the hierarchy
import humap
from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
# build a hierarchy with three levels
hUmap = humap.HUMAP([0.2, 0.2])
hUmap.fit(X, y)
# embed level 2
embedding2 = hUmap.transform(2)
Refer to notebooks/ for complete examples.
C++ UMAP implementation
You can also fit a one-level HUMAP hierarchy, which essentially fits UMAP projection.
umap_reducer = humap.UMAP()
embedding = umap_reducer.fit_transform(X)
Please, use the following reference to cite HUMAP in your work:
@ARTICLE{marciliojr_humap2024,
author={Marcílio-Jr, Wilson E. and Eler, Danilo M. and Paulovich, Fernando V. and Martins, Rafael M.},
journal={IEEE Transactions on Visualization and Computer Graphics},
title={HUMAP: Hierarchical Uniform Manifold Approximation and Projection},
year={2024},
volume={},
number={},
pages={1-10},
doi={10.1109/TVCG.2024.3471181}
}
HUMAP follows the 3-clause BSD license.