Propaganda in Hindi

Repository of the data and models generated by Mr. Shyam Ratan as part of his MPhil dissertation titled 'Automatic Detection Of Propaganda In Hindi On Social Media', in collaboration with the UnReaL-TecE LLP. All the information regarding the dataset, models, and results are given below:

Dataset

v0.1

This data is used for automatic detection of propaganda in Hindi and two supporting case studies in MPhil Dissertation. This version has two phases and both of these phases has two divisons: Annotated data and Raw data. Phase - 1 has the data which is used for the pilot of this work for automatic detection and result as well. The Phase - 2 data is used in two imporatnt case studies of this research work. Though, in the final stage of this research whole data of phase - 1 and 2 is used to train and test language models for automatic detection of propaganda in Hindi.

Data Structure

Navigation - Dataset -> v0.1 -> Phase - 1 -> {1. Annotated and 2. Raw} - 500 articles/documents; Phase - 2 -> {1. Annotated and 2. Raw} - 399 articles/docuemnts.
Here in this version data is distributed in two phases which is mentioned earlier. Phase - 1 has annoated data of 8 Hindi newspapers viz. Aap Ki Kranti, Amar Ujala, Dainik Bhaskar, Dainik Jagran, Hindustan, Media Vigil, Saamana, tfipost and 2 peiodicals viz. Kamal Sandesh and Panchjanya, for balancy each source has 50 annotated news articles/documents. Each direcotry has same numbers of ann and txt file, here ann files has propaganda labeled spans and sentences while txt files has data. This phase also has same amount of raw data news articles/documents in Raw direcotry. Where as Phase - 2 has annotated data of 18 newspapers viz. Aap Ki Kranti, Agnibaan, Amar Ujala, Dainik Bihar, Dainik Bhaskar, Dainik Jagran, Haribhoomi, Hindustan, Jansandesh Times, Janwarta, Media Vigil, Naye Samikaran, Newslaundry, Panchjanya, Saamana, Swarajya, Swatantra Bharat, tfipost, Virarjun and 2 periodicals viz. Kamal Sandesh and Panchjanya. Here each source has 20 annoated news articles/documents except Panchajanya has 19 articles.

v0.2

This data is annotated but not used in this Mphil work because of maintaing the balancy of data used in automatic detection and case studies.

v0.3

This is still in raw form and developed from social media, which available for intrested people who can use this data for furture study in this direction.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
Dataset		Dataset
machine_learning_models		machine_learning_models
propaganda-nlp4if-coling2020-dataset		propaganda-nlp4if-coling2020-dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
get_propaganda_statistics.py		get_propaganda_statistics.py
propaganda_counts.txt		propaganda_counts.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Propaganda in Hindi

Dataset

v0.1

Data Structure

v0.2

v0.3

About

Releases

Packages

Contributors 2

Languages

License

kmi-linguistics/propaganda

Folders and files

Latest commit

History

Repository files navigation

Propaganda in Hindi

Dataset

v0.1

Data Structure

v0.2

v0.3

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages