Speed-up kmeans via improved distance calculation #469

Cdebus · 2020-01-30T07:15:30Z

Related
Experiments have shown kmeans clustering to be rather slow. The main issue is the calcualtion of the distance matrix, which is currently done via dimension expansion and 3D Difference calculation. However, we suspect this to cause cache misses, and substantial overhead to the caculation, thus slowing it down

Feature functionality
Torch offers a cdist(X,Y) function to calculate pairwise distances between all samples (rows) from two vectors. There is also some alternative approaches being discussed.

Additional context
pytorch/pytorch#15253
https://discuss.pytorch.org/t/efficient-distance-matrix-computation/9065

Cdebus · 2020-02-11T15:05:25Z

This was taken care of with PR #470 and the extension #479

Cdebus self-assigned this Jan 30, 2020

Cdebus added enhancement New feature or request high-level functions High-level machine-learning algorithms labels Jan 30, 2020

Cdebus mentioned this issue Jan 30, 2020

Features/469 kmeans rework #470

Merged

4 tasks

Cdebus closed this as completed Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed-up kmeans via improved distance calculation #469

Speed-up kmeans via improved distance calculation #469

Cdebus commented Jan 30, 2020

Cdebus commented Feb 11, 2020

Speed-up kmeans via improved distance calculation #469

Speed-up kmeans via improved distance calculation #469

Comments

Cdebus commented Jan 30, 2020

Cdebus commented Feb 11, 2020