spark-streaming-topic-model

Latent Dirichlet Allocation (LDA) based topic modeling of news corpus in Apache Spark Streaming

System Requirements

Java 1.8.x
Scala 2.10.x
Spark 1.6 +
Any OS
6GB+ RAM

How it works

Dataset Download

AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004.

The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non - commercial activity.

The code will automatically download the yahoo news corpus Ref . In case of some issue, you can directly download the news corpus file (118 MB) from here

Core LDA model

Online LDA with minibatch processing in Apache Spark

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-streaming-topic-model

System Requirements

How it works

Dataset Download

Core LDA model

Online LDA with minibatch processing in Apache Spark

REF

About

Releases

Packages

dhwajraj/spark-streaming-topic-model

Folders and files

Latest commit

History

Repository files navigation

spark-streaming-topic-model

System Requirements

How it works

Dataset Download

Core LDA model

Online LDA with minibatch processing in Apache Spark

REF

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages