Text Summarizer for Hindi Wikipedia Articles

This is a project made by:

Prajneya Kumar
Shivansh S.
Tejasvi Chebrolu

How to Use

Clone the repository
Install all dependencies mentioned in requirements.txt
Choose which method you would like to use, and depending on that go to appropriate section

Method I

This model generates a summary using a Document Term Matrix and frequency count. To use this

Go to the method_1 folder
Place your article in valid folder named as article.txt.
Run the extractive.py file using python3.
You will end up getting a summary named as summary.txt inside the valid folder.

Method II

This model generates a summary using modified TF-IDF of the document dataset, with weights attached. To use this

Go to the method_2 folder
Place your article in valid folder
Run the code in jupyter notebook
Input the name of your file which is within that directory
You will end up getting a summary + wordcloud in the output folder :)

Calculating Accuracy

Add the Gold standard for the summary as n.txt in the Gold folder in the Summaries directory. Here n is the next number in the sequence in the Gold folder.
For example, if there are 7 files in the Gold Folder, they must be labelled as 1.txt 2.txt ... 7.txt etc.
Repeat this process for the summaries generated by the rule-based method and the extractive method and store them in the Extractive and RuleBased directories.
You can do this on the terminal via simple redirection.
Now, in the accuracy.py file on line number 15, change the code to for i in range(1, n+1): where n is the same variable as above.
For example, if your file was saved as 9.txt you would change the code to for i in range(1, 10):
Run the code as python accuracy.py
If you want individual accuracies for any article, you can uncomment line number 62 in the Rouge_1.py file.
It is advised then to redirect to a new file as python accuracy.py > output.txt to enable better formatting.

Initial Results

For Method I we got an accuracy of 74.1% For Method II we got an accuracy of 83.4%

Methods of Evaluation

The evaulation was done based on the Rouge method proposed by Chin-Yew Lin. For this project, since the summarization has been extractive, only Rouge-I has been used. To generate the gold standard for the summaries, the annotation was done manually. For any given article, the annotators were asked to pick the most important sentences. The only rule was that the number of sentences they could choose was equal to 0.3N where N was the number of sentences in the initial article.

Human Evaluators

We thank the following for creating the gold standard summaries:

Abhinav Menon
Trisha Kaore
Yash Agrawal
Eshika Khandelwal
Vidushi Bhartari
Shashwat Singh
Shubhankar Kamthankar

How to Contribute

Fork this repository
Clone the forked repository to your local system
git remote add upstream https://github.com/AurumnPegasus/Text-Summariser.git
Install all required dependencies (mentioned in requirements.txt)
Commit and Send PRs :)

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.ipynb_checkpoints		.ipynb_checkpoints
ArticleList		ArticleList
Rouge		Rouge
Summaries		Summaries
method_1		method_1
method_2		method_2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Report.pdf		Report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Summarizer for Hindi Wikipedia Articles

How to Use

Method I

Method II

Calculating Accuracy

Initial Results

Methods of Evaluation

Human Evaluators

How to Contribute

About

Releases

Packages

Contributors 3

Languages

License

AurumnPegasus/Text-Summariser

Folders and files

Latest commit

History

Repository files navigation

Text Summarizer for Hindi Wikipedia Articles

How to Use

Method I

Method II

Calculating Accuracy

Initial Results

Methods of Evaluation

Human Evaluators

How to Contribute

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages