Correlation cache

Online correlation cache, based on links online clustering

For Siamese Neural Network training

Usage example

from cache_wrapper import EvictingCacheWrapper
import numpy as np

cache = EvictingCacheWrapper(0.1, 0.05, 1.0, True, 10)
for i in range(100):
    cache.push(new_key=i, new_vector=streaming_input_vectors[i], top_n=0)
similar_vectors = cache.push(new_key=100, new_vector=np.array([1,0,0,0,0]), top_n=10)
...

For more usage examples, see the tests and benchmark.

Benchmark

Do shuffle!

TODO

Stress test

Links Online Clustering

Python implementation of the Links Online Clustering algorithm: https://arxiv.org/abs/1801.10123

Title: Links: A High-Dimensional Online Clustering Method

Authors: Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, Ignacio Lopez Moreno

Overview

This is a clustering algorithm for online data. That is, it will predict cluster membership for vectors that it is shown one-by-one. It does not require examining the entire dataset to predict cluster membership.

It works by maintaining a two-level hierarchy of clusters and subclusters. Each subcluster has a centroid that is compared with new vector for prediction using cosine similarity. Depending on the previous data that has been seen, the new data point can be assigned to an existing cluster/subcluster, assigned to a new subcluster within an existing cluster, or assigned to a new subcluster and cluster.

Instantiating the class requires 3 hyperparameters:

cluster_similarity_threshold
subcluster_similarity_threshold
pair_similarity_maximum

These details are best understood by reading the paper.

Installation

pip install -r requirements.txt

Usage example

from links_cluster import LinksCluster

...
links_cluster = LinksCluster(cluster_similarity_threshold, subcluster_similarity_threshold, pair_similarity_maximum)
for vector in data:
    predicted_cluster = links_cluster.predict(vector)

For more usage examples, see the tests.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
benchmark		benchmark
tests		tests
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
cache_wrapper.py		cache_wrapper.py
links_cluster.py		links_cluster.py
pylintrc		pylintrc
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Correlation cache

Usage example

Benchmark

TODO

Links Online Clustering

Overview

Installation

Usage example

About

Releases

Packages

Languages

License

Thessal/links_clustering

Folders and files

Latest commit

History

Repository files navigation

Correlation cache

Usage example

Benchmark

TODO

Links Online Clustering

Overview

Installation

Usage example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages