Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add medoid method for calculation of cluster center #6

Merged
merged 11 commits into from
Jul 14, 2023
Merged

Conversation

thorstenwagner
Copy link
Collaborator

@thorstenwagner thorstenwagner commented May 31, 2023

We already knew that calculating the arithmetic average of all embeddings belonging to one clusters isn't ideal. Now, we are calculating the medoid instead of the arithmetic average. We expect that the medoid should be much less sensitive on outliers, plus the medoid is always on the surface of the hyper sphere.

scipy.cdist is actually quite fast. In case of more then 50k samples per cluster we take a random sample of 50k and calculate the medoid based on the sub-sample

Needs some checks by @GavinR1 ...

@thorstenwagner thorstenwagner requested a review from GavinR1 May 31, 2023 14:57
@codecov
Copy link

codecov bot commented Jul 14, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@77a7213). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main       #6   +/-   ##
=======================================
  Coverage        ?   30.00%           
=======================================
  Files           ?        4           
  Lines           ?      170           
  Branches        ?        0           
=======================================
  Hits            ?       51           
  Misses          ?      119           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@thorstenwagner thorstenwagner merged commit 1d41196 into main Jul 14, 2023
@thorstenwagner thorstenwagner deleted the medoid branch July 14, 2023 11:08
@thorstenwagner
Copy link
Collaborator Author

Alright, merged. It works :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant