Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/599 kmeans and friends #634

Merged
merged 28 commits into from
Jul 22, 2020
Merged

Conversation

Cdebus
Copy link
Contributor

@Cdebus Cdebus commented Jul 16, 2020

Description

Implementation of kmedians and kmedoids-version (kmedians snaped to nearest data point). Along that line, kmeans was refactored by introducing a base class _kcluster, that handles centroid initialization, predict function and assigning data points to the respective cluster centers. Kmeans, kmedians and kmedoids are derivatives of _kcluster with specific metrics (euclidean, manhattan) and different implementations of the centroid updating function.

Issue/s resolved: #409 #599

Changes proposed:

  • Introduce _kcluster.py and move functions initialize_centroids, fit_to_cluster (=assign_to_cluster) and predict there
  • Implement manhattan distance in spatial.distances
  • Implement kmedians as _kcluster with manhattan distance and median calculation for updating centroids
  • Implement kmedoids as variation of kmedians, that assigns the updated cluster centers to the data point lying closest to the actual median in each iteration
  • Implement an example on spherical clusters to see precision and differences between the three k-clustering algorithms
  • some minor bugfixes

Type of change

  • New feature (non-breaking change which adds functionality)

Due Diligence

  • All split configurations tested : No Unittest for kmedians and kmedoids for unsplit data, as this causes transient errors in the unit tests. However, functionality was checked in applications.
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

Debus and others added 21 commits July 7, 2020 13:58
…cluster (pendant to clipping points-in-cluster to 1 in menas)
 for more samples and other datatypes); fixed Typecasting in kmeans
@codecov
Copy link

codecov bot commented Jul 16, 2020

Codecov Report

Merging #634 into master will decrease coverage by 0.12%.
The diff coverage is 92.70%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #634      +/-   ##
==========================================
- Coverage   97.40%   97.27%   -0.13%     
==========================================
  Files          79       84       +5     
  Lines       16126    16535     +409     
==========================================
+ Hits        15708    16085     +377     
- Misses        418      450      +32     
Impacted Files Coverage Δ
heat/core/_operations.py 92.73% <ø> (ø)
heat/cluster/kmedians.py 70.58% <70.58%> (ø)
heat/cluster/kmedoids.py 76.19% <76.19%> (ø)
heat/cluster/_kcluster.py 88.78% <88.78%> (ø)
heat/cluster/__init__.py 100.00% <100.00%> (ø)
heat/cluster/kmeans.py 96.77% <100.00%> (+5.86%) ⬆️
heat/cluster/tests/test_kmeans.py 100.00% <100.00%> (ø)
heat/cluster/tests/test_kmedians.py 100.00% <100.00%> (ø)
heat/cluster/tests/test_kmedoids.py 100.00% <100.00%> (ø)
heat/core/__init__.py 100.00% <100.00%> (ø)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6629499...ef92850. Read the comment docs.

CHANGELOG.md Outdated Show resolved Hide resolved
examples/cluster/demo_kClustering.py Outdated Show resolved Hide resolved
examples/cluster/demo_kClustering.py Outdated Show resolved Hide resolved
examples/cluster/demo_kClustering.py Outdated Show resolved Hide resolved
examples/cluster/demo_kClustering.py Outdated Show resolved Hide resolved
heat/cluster/kmedians.py Show resolved Hide resolved
heat/cluster/kmedoids.py Show resolved Hide resolved
heat/spatial/distance.py Show resolved Hide resolved
heat/cluster/_kcluster.py Outdated Show resolved Hide resolved
heat/cluster/kmeans.py Outdated Show resolved Hide resolved
@Markus-Goetz Markus-Goetz self-assigned this Jul 17, 2020
@Markus-Goetz Markus-Goetz merged commit 56083ab into master Jul 22, 2020
This was referenced Jul 23, 2020
@mtar mtar deleted the features/599-kmeans-and-friends branch May 5, 2023 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expand kmeans tests to cover more diverse range of data
2 participants