Classification Of Hindi News (COHN), this application uses the monsoon-nlp/hindi-bert model and attempts to use transfer learning on the model to classify a set of Hindi news snippets into its respective predefined categories.
This project was made in collaboration with:
- About
- Folder Structure
- Installation
- How to run the application?
- What's what?
- Demonstration
- References
In today’s era, when everything is going digital there is a large amount of data available. One of the versatile forms of data is news. The news is spread widely due to technological advancements and it influences people to a great extent. The news classification is an important aspect when there is a processing of news information, it can help distinguish news according to its category and even help to organize. It is even helpful for preference or relevance. In this project, a Hindi News Classification model is proposed, which uses the monsoon-nlp/hindi-bert model and attempts to use transfer learning on the model to classify a set of Hindi news snippets into its respective predefined categories. Such text classification becomes challenging in Hindi due to its large set of available conjuncts and letter combinations, its sentence structure, and multisense words. We used BBC Hindi News Dataset to train the model and gained an accuracy of 63.47%.
classification-of-hindi-news
├───dataset
├───flask-app
│ ├───static
│ │ ├───css
│ │ ├───img
│ │ ├───js
│ │ └───vendor
│ └───templates
├───hindibert
├───model
│ └───hindi_bert_model
├───notebooks
└───tests
Clone the repository. Before installing the requirements, create a python or conda environment. An environment is a tool that helps you keep dependencies required by different projects separate by creating isolated virtual environments for them.
Open your terminal and install the virtual environment tool with pip as follows :
pip install virtualenv
After the virtualenv has been installed, cd to the folder where you've saved this application from the terminal and run the following command to create a virtual environment :
cd path_to_folder
virtualenv -p python3.7.10 env_name
Activate your environment :
env_name\Scripts\activate
Open your Anaconda prompt (You can also use miniconda). Create a conda environment using the following command :
conda create -n env_name python=3.7.10 anaconda
After successfully creating your environment, activate it by running :
conda activate env_name
Once you have created an environment using either one of the above methods, install the application's requirements :
pip install -r requirements.txt
Open your terminal, activate your python/conda environment, change directory to flask-app and run the app.py file using the following command :
python app.py
or
flask run
-
This application uses Flask, HTML, CSS, JavaScript, JQuery, and Ajax.
-
The app.py file consists of the flask application.
-
This flask application uses various templates that are created using HTML and are stored in the templates folder.
-
The CSS and JavaScript files used by the HTML templates are stored in the static folder.
-
The main page ie the index.html file consists of the basic details of this application: How it works, about the system, the system, about the team, etc.
-
The hindibert folder consists of the python source code required to run the application and the model folder contains our Hindi BERT model which is trained on our dataset.
-
The notebooks folder consists of various notebooks for data exploration, training and testing the Hindi BERT model.
Click on the GIF to watch the demonstration video.
[1] Hindi BERT model
[3] Deep Learning for Hindi Text Classification: A Comparison