Skip to content

Code for the proposed IMCKDE clustering algorithm, developed at UFPR (Federal University of Paraná). Paper submitted for publication in the Journal of Applied Statistics.

Notifications You must be signed in to change notification settings

paulomuraroferreira/IMCKDE

Repository files navigation

README - IMCKDE Algorithm

The purpose of this notebook is to provide a step-by-step tutorial of how the new proposed clustering algorithm can be used.

As an example, we will perform clustering on the Iris dataset.

Start by installing the requirements.

!pip install -r requirements.txt

Import Libraries

from IMCKDE_algorithm import IMCKDE
from sklearn import datasets
from sklearn.metrics import silhouette_score

Loading the Iris dataset.

iris = datasets.load_iris()

The IMCKDE class takes the following parameters:

Parameters:

    dataset (array-like): Data points.  
    n_clusters (int): Number of clusters. Optional.
    alpha (float): Smoothing parameter.  
    beta (float): Threshold multiplier.  
    adam (bool): Whether to use ADAM optimizer.

So let's predict the results for the Iris dataset and display the
clustering centroids and the points classification.

imkde = IMCKDE(dataset = iris.data, alpha = 2.1, beta = 3, n_clusters = 3).predict()

import pprint

print(f"\033[1m Clustering centroids: \033[0m {imkde.centroids()}")
print()
print(f" Clustering points classification: \033[0m {imkde.clusters()}")
Clustering centroids: [array([5.02, 3.41, 1.48, 0.25]), array([5.74, 2.82, 4.19, 1.29]), array([6.63, 3.08, 5.51, 2.15])]

Clustering points classification: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 2 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 2 2 2 2
 2 2 1 2 2 2 2 2 1 2 1 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2
 2 2]

Calculating the Silhouette Coefficient:

silhouette = imkde.metrics_calculation("silhouette", iris.target)

print(f'Silhouette Coefficient for the Iris dataset: {silhouette:.3f}')
Silhouette Coefficient for the Iris dataset: 0.528

Since we have the original labels, let's calculate the precision and plot
the confusion matrix:

precision = imkde.metrics_calculation("precision", iris.target)

print(f'Precision for the Iris dataset: {precision:.3f}')
Precision for the Iris dataset: 0.927
from utils import plot_confusion_matrix
y_true = iris.target

y_pred = imkde.output_array
labels = ['Setosa', 'Versicolor', 'Virginica']
plot_confusion_matrix(y_true, y_pred, labels = labels, filename = '.\iris_imkde.jpeg', title = 'Confusion Matrix - IMCKDE')

png

Comparison with K-means

Let's perform the same calculations with k-means and compare it to IMCKDE.

from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from assignment_problem import ClusterMapper
from sklearn.metrics import precision_score

iris = load_iris()
target = iris.target
n_clusters = 3
dataset = iris.data
kmeans = KMeans(n_clusters=3, random_state=0, n_init='auto').fit(iris.data)
result = (kmeans.labels_, kmeans.cluster_centers_,)
mapper = ClusterMapper(n_clusters)
maps = mapper.mapping__clusters(target, dataset, result)

precision = precision_score(target, maps, average='weighted')
silhouette = silhouette_score(dataset, kmeans.labels_)

print(f'Precision Score for k-means: {precision}')
print(f'Silhouette Coefficient for k-means: {silhouette}')
Precision Score for k-means: 0.8978562421185372
Silhouette Coefficient for k-means: 0.551191604619592
  • Conclusion:

    The IMCKDE has a lower Silhouette Coefficient but a higher precision in the Iris dataset
    when compared to the k-means.

    Below the confusion matrix is displayed for the k-means.

from utils import plot_confusion_matrix
y_true = iris.target
y_pred = maps
labels = ['Setosa', 'Versicolor', 'Virginica']
plot_confusion_matrix(y_true, y_pred, labels = labels, filename = '.\iris_kmeans.jpeg', title = 'Confusion Matrix - K-means')

png

About

Code for the proposed IMCKDE clustering algorithm, developed at UFPR (Federal University of Paraná). Paper submitted for publication in the Journal of Applied Statistics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages