Kmeans

Package description

This package consists of R functions that implement k-means clustering from scratch. This will work on any dataset with valid numerical features, and includes fit, predict, and clustersummary functions, as well as elbow and silhouette methods for hyperparameter “k” optimization. A high level overview of each function is given below. See each function’s documentation for more details.

fit: This function classifies the non-labeled data into a given number of clusters k using the simple KMeans algorithm. It returns labels for each data point according to the cluster it belongs to and also cluster centers. This is a type of unsupervised learning method to classify data.
predict: Assigns each point in a dataset to a cluster. Dataset has to be in the same format as the original dataset the model was fit on.
elbow: Creates a plot of inertia vs number of cluster centers as per the elbow method. Calculates and returns the inertia values for all cluster centers. Useful for identifying the optimal number of clusters while using the k-means clustering algorithm.
silhouette: Returns the average silhouette score of each sample in a given 2-d array and clustering labels.
clustersummary: Provides summary of groups created from Kmeans clustering, including centroid coordinates, number of data points in training data assigned to each cluster, and within-cluster distance metrics.

There is a built-it k-means function in R. This package is not meant to add to the existing ecosystem; but is rather intended to deepen fundamental understanding of these algorithms.

Installation

To use this pacakge, install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("UBC-MDS/Kmeans")

Dependencies

tidyverse

Usage

This is a basic example which shows you how to solve a common problem:

First, load the required pacakges and fit the data:

library(Kmeans)
library(tidyverse)
library(dplyr)

X = data.frame(x1 = c(1, 2, 3, 5, 53, 21, 43),
               x2 = c(1, 2, 3, 5, 53, 21, 43))
kmeans_results = fit(X, 2)

Use the fitted model to predict labels for new data:

X_new = data.frame(x1 = c(1, 4),
                   x2 = c(3, 2))
predict(X_new, kmeans_results$centers)

Use the clustersummary function to get information on the fitted model:

clustersummary(X, kmeans_results$centers, kmeans_results$labels)

If uncertain on the best value of k to choose, use the elbow and silhouette functions:

centers <- c(2, 3, 4, 5)
inertia <- elbow(X, centers)$inertia

k_vector <- c(2, 3, 4, 5)
scores <- silhouette(X, k_vector)$scores

Tests

To test that the functions work as intended, run devtools::test() in the root of the project repo in an Rconsole.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
R		R
docs		docs
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CONDUCT.md		CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
DESCRIPTION		DESCRIPTION
Kmeans.Rproj		Kmeans.Rproj
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Kmeans

Package description

Installation

Dependencies

Usage

Tests

About

Licenses found

Releases 6

Packages

Contributors 4

Languages

License

Licenses found

UBC-MDS/KmeansR

Folders and files

Latest commit

History

Repository files navigation

Kmeans

Package description

Installation

Dependencies

Usage

Tests

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 4

Languages

Packages