How to Utilize Machine Learning to Automatically Detect Patterns in Text

In the last post, we talked about Topic Modeling, or a way to identify several topics from a corpus of documents. The method used there was Latent Dirichlet Allocation or LDA. In this article, we're going to perform a similar task but through the unsupervised machine learning method of clustering. While the method is different, the outcome is several groups (or topics) of words related to each other.

For this example, we will use the Wine Spectator reviews dataset from Kaggle[^KAGGLE]. It contains a little over 100,000 different wine reviews of varietals worldwide. The descriptions of the wines as tasting notes are the text-based variable that we're going to use to cluster and interpret the results.

Read more here: https://www.dataknowsall.com/textclustering.html

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
TextClustering.code-workspace		TextClustering.code-workspace
text_clustering.ipynb		text_clustering.ipynb
textclustering_elbow.png		textclustering_elbow.png
wine.csv		wine.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to Utilize Machine Learning to Automatically Detect Patterns in Text

About

Releases

Packages

Languages

broepke/TextClustering

Folders and files

Latest commit

History

Repository files navigation

How to Utilize Machine Learning to Automatically Detect Patterns in Text

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages