Skip to content

Latest commit

 

History

History
40 lines (34 loc) · 1.43 KB

README.md

File metadata and controls

40 lines (34 loc) · 1.43 KB

alt tag

Cluster analysis is an important tool for unsupervised learning. Determining the number of clusters K is a difficult problem. Our goal is to explore the different ways which are currently used to determine the optimal number of clusters in k-means.

###Team members:

  • Nikhila Balaji
  • Katherine Brey
  • Hussein Koprly
  • Alec McDivitt
  • Julie Persinger
  • Hunter Sipe
  • Yi Wei

####Mentor: Bharathkumar “Tiny” Ramachandra

###Objective:

  • Learn how to think about the solution to a hard problem in unsupervised learning, implement the solution, create and execute a structured work plan
  • Identify/Recognize some challenges that arise in unsupervised learning and understand why determining the optimal number of clusters is a hard problem
  • Analyze and identify the steps in the code structure in the k-means clustering implementation
  • Be able to generate 4 clustering datasets with different properties
  • Model the mathematical representation of K in all the 4 methods:
    1. Gap Statistic
    2. Elbow
    3. Information theoretic
    4. Avg Silhouette

###Tasks - Implementations:

  • Implementing K-means
  • Gap Statistic
  • Elbow method
  • Information Theoretic
  • Avg Silhouette

####Required Libraries:

install.packages('plot3D')
install.packages('scatterplot3d')
install.packages('car')
install.packages('pracma')