Skip to content

serhii-soboliev/kmeans-clustering-service

Repository files navigation

Build Status Total alerts Language grade: Java


Table of Contents

About the project

Service, that implements clustering technique using K-means algorithm.

Clustering visualization example

The following screenshots show the result of algorithm's execution over the set of 2D points that could be separated into 4 clusters.

Clustering visualization Clustering visualization 2

Build and run guidance

  1. Clone project: git clone https://github.com/SergiySobolev/kmeans-clustering-service.git
  2. Go to the root of the project directory: cd kmeans-clustering-service
  3. Build backend: ./gradlew build
  4. Run backend: java -jar build/libs/kmeans-clustering-service.jar . By default, backend service will start on port 11111.

Generate data

Backend provides endpoint to generate synthetic data that could be divided into clusters:

URL /generatedata

METHOD POST

HEADERS Content-Type: application/json

DATA PARAMS

clusterNum: int,

bounds: 2d array of ints

RESPONSE

data: 2d array of ints

EXAMPLE

Generate data that could be divided into 3 clusters

Request:

POST http://{host}:11111/generatedata

Content-Type: application/json

{"clusterNum": 3, "bounds": [[100, 200],[100, 200],[350, 550],[350, 550],[2000, 2500],[2000, 2500]]}

Response visualization:

Generated data response visualization

Clusterize data

Backend provides endpoint to separate points into clusters:

URL /clusterdata

METHOD POST

HEADERS Content-Type: application/json

DATA PARAMS

type: Algorithm type. Only possible value for the moment is "KMEANS". Another algorithms will be added furter

clusterNum: Number of clusters the data must be divided to

data: Array of 2d point to clusterize

RESPONSE

data: array of clusterized points, where clusterized points is [x,y,cluster_index]

EXAMPLE

Divide data into 3 clusters

Request:

POST http://{host}:11111/clusterdata

Content-Type: application/json

{'type':'KMEANS', 'clusterNum': 3, 'data':generated_data} where generated_data is the result of Generate data request

Response visualization: Clusterized data response visualization

Azure deployment

The entire solution can be deployed on Azure Cloud using Container Instances

Prerequisites:

  1. Get Microsoft Azure Subscription
  2. Install Terraform

Steps:

  1. git clone https://github.com/SergiySobolev/kmeans-clustering-service.git
  2. cd kmeans-clustering-service/azureiac
  3. terraform init
  4. terraform plan -target=module.backend_container
  5. terraform apply -target=module.backend_container
  6. terraform plan -target=module.frontend_container
  7. terraform apply -target=module.frontend_container
  8. Go to Jupyter Notebook

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published