Summarizing-large-text-collection-using-topic-modeling-and-clustering-based-on-MapReduce-framework1

Big Data

Paper

PPT

Python code

Author Idea

Document summarization provides an instrument for faster understanding the collection of text documents and has a number of real life applications. Semantic similarity and clustering can be utilized efficiently for generating effective summary of large text collections. Summarizing large volume of text is a challenging and time consuming problem particularly while considering the semantic similarity computation in summarization process. Summarization of text collection involves intensive text processing and computations to generate the summary. MapReduce is proven state of art technology for handling Big Data. In this paper, a novel framework based on MapReduce technology is proposed for summarizing large text collection. The proposed technique is designed using semantic similarity based clustering and topic modeling using Latent Dirichlet Allocation (LDA) for summarizing the large text collection over MapReduce framework. The summarization task is performed in four stages and provides a modular implementation of multiple documents summarization. The presented technique is evaluated in terms of scalability and various text summarization parameters namely, compression ratio, retention ratio, ROUGE and Pyramid score are also measured. The advantages of MapReduce framework are clearly visible from the experiments and it is also demonstrated that MapReduce provides a faster implementation of summarizing large text collections and is a powerful tool in Big Text Data analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarizing-large-text-collection-using-topic-modeling-and-clustering-based-on-MapReduce-framework1

Paper

PPT

Python code

Author Idea

About

Releases

Packages

bimalgupta150/Summarizing-large-text-collection-using-topic-modeling-and-clustering-based-on-MapReduce-framework1

Folders and files

Latest commit

History

Repository files navigation

Summarizing-large-text-collection-using-topic-modeling-and-clustering-based-on-MapReduce-framework1

Paper

PPT

Python code

Author Idea

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages