Cluster analysis is an important tool for unsupervised learning. Determining the number of clusters K is a difficult problem. Our goal is to explore the different ways which are currently used to determine the optimal number of clusters in k-means.

###Team members:

Nikhila Balaji
Katherine Brey
Hussein Koprly
Alec McDivitt
Julie Persinger
Hunter Sipe
Yi Wei

####Mentor: Bharathkumar “Tiny” Ramachandra

###Objective:

Learn how to think about the solution to a hard problem in unsupervised learning, implement the solution, create and execute a structured work plan
Identify/Recognize some challenges that arise in unsupervised learning and understand why determining the optimal number of clusters is a hard problem
Analyze and identify the steps in the code structure in the k-means clustering implementation
Be able to generate 4 clustering datasets with different properties
Model the mathematical representation of K in all the 4 methods:
1. Gap Statistic
2. Elbow
3. Information theoretic
4. Avg Silhouette

###Tasks - Implementations:

Implementing K-means
Gap Statistic
Elbow method
Information Theoretic
Avg Silhouette

####Required Libraries:

install.packages('plot3D')
install.packages('scatterplot3d')
install.packages('car')
install.packages('pracma')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls