How to approach clustering? #56

OmarShehata · 2024-09-19T12:23:54Z

OmarShehata
Sep 19, 2024

Hello! I'm using vectra to do local, fast vector search after getting embeddings from OpenAI etc, it's incredible!! Love how extremely easy, portable, and fast it is.

I know how to query for the distances to a given vector, but how would you approach clustering? As in, I define some threshold, and I can see which groups of vectors are most similar (my use case is, I have a DB of 15k tweets and I want to semantically cluster them). The "naive" way would be take every single vector and find the closest ones to it, repeat that for all?

(is this where you would do something like PCA, or k-means etc? is this out of scope for vectra/vector databases like this in general? apologies for the noob question, thank you for your time!!)

OmarShehata · 2024-09-25T15:17:23Z

OmarShehata
Sep 25, 2024
Author

Figured it out, very simple with ml-kmeans. Made a little repo showing how to do this (unrelated to vectra, you can just extract all the vectors and cluster them yourself): https://github.com/OmarShehata/semantic-embedding-template?tab=readme-ov-file#semantic-embedding-template

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to approach clustering? #56

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to approach clustering? #56

OmarShehata Sep 19, 2024

Replies: 1 comment

OmarShehata Sep 25, 2024 Author

OmarShehata
Sep 19, 2024

OmarShehata
Sep 25, 2024
Author