Cluster analysis is an important tool for unsupervised learning. Determining the number of clusters K is a difficult problem. Our goal is to explore the different ways which are currently used to determine the optimal number of clusters in k-means.
###Team members:
- Nikhila Balaji
- Katherine Brey
- Hussein Koprly
- Alec McDivitt
- Julie Persinger
- Hunter Sipe
- Yi Wei
####Mentor: Bharathkumar “Tiny” Ramachandra
###Objective:
- Learn how to think about the solution to a hard problem in unsupervised learning, implement the solution, create and execute a structured work plan
- Identify/Recognize some challenges that arise in unsupervised learning and understand why determining the optimal number of clusters is a hard problem
- Analyze and identify the steps in the code structure in the k-means clustering implementation
- Be able to generate 4 clustering datasets with different properties
- Model the mathematical representation of K in all the 4 methods:
- Gap Statistic
- Elbow
- Information theoretic
- Avg Silhouette
###Tasks - Implementations:
- Implementing K-means
- Gap Statistic
- Elbow method
- Information Theoretic
- Avg Silhouette
####Required Libraries:
install.packages('plot3D')
install.packages('scatterplot3d')
install.packages('car')
install.packages('pracma')