Clustering News with Expectation Maximization

In this project we apply clustering techniques to cluster data from the BBC dataset. Our goal is to identify patterns of news without using labels. The approach we will follow for EM in this project follows the work developed by professor Gholamreza Haffari from Monash University.

Expectation Maximization for Document Clustering

From rapidminer The EM (expectation maximization) technique is similar to the K-Means technique. The basic operation of K-Means clustering algorithms is relatively simple: Given a fixed number of k clusters, assign observations to those clusters so that the means across clusters (for all variables) are as different from each other as possible. The EM algorithm extends this basic approach to clustering by assigning examples to clusters to maximize the differences in means for continuous variables.The EM clustering algorithm computes probabilities of cluster memberships based on one or more probability distributions. The goal of the clustering algorithm then is to maximize the overall probability or likelihood of the data, given the (final) clusters.

Datasets

A public dataset from the BBC comprised of 2225 articles, each labeled under one of 5 categories: business, entertainment, politics, sport or tech. The dataset is broken into 1490 records for training and 735 for testing. Available in kaggle

Visuals / Results

We achieve good results with similar patterns to those of the labelled data.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
data		data
.gitignore		.gitignore
EM-document-clustering.ipynb		EM-document-clustering.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering News with Expectation Maximization

Expectation Maximization for Document Clustering

Datasets

Visuals / Results

About

Releases

Packages

Contributors 2

Languages

Agewerc/Expectation-Maximization-Document-Clustering

Folders and files

Latest commit

History

Repository files navigation

Clustering News with Expectation Maximization

Expectation Maximization for Document Clustering

Datasets

Visuals / Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages