News Category Classification

This project focuses on classifying news articles into various predefined categories using Natural Language Processing (NLP) and Machine Learning (ML) techniques. The main objective is to create a model that accurately categorizes articles into topics like Politics, Economics, Sports, Technology, Social, Cultural, and Miscellaneous.

Overview

This project leverages NLP techniques to preprocess and classify news articles. The model is designed to efficiently handle large volumes of textual data, performing one category per article. The key steps involve text preprocessing, feature extraction, and model training using various ML algorithms:

Dataset

The dataset includes thousands of news articles labeled under seven different categories:

Politics, Economics, Sports, Technology, Social, Cultural, and Miscellaneous.

These categories were selected to cover a wide range of topics of general interest.

Preprocessing

Text Preprocessing: Tokenization, stopword removal, and punctuation cleaning were performed using Python's NLP libraries.
Feature Extraction: The TF-IDF method was used to convert text into numerical features. We considered different n-gram ranges for feature extraction and experimented with the number of features to optimize model performance.

Model Architecture

The classification model is built using a neural network with the following architecture:

Input Layer: Accepts the TF-IDF feature vectors.
Dense Layer 1: Contains 32 units with ReLU activation.
Dense Layer 2: Contains 32 units with ReLU activation.
Output Layer: For mono-label classification, the output layer has 9 units (one for each category) with a softmax activation function.

Loss and Optimizer

Loss Function: Categorical Crossentropy.
Optimizer: Adam optimizer was used to minimize the loss function.

Training Process

The model was trained on a split dataset:

Training Data: 80% of the dataset.
Validation Data: 20% of the dataset.
Batch Size: 32
Epochs: 10

The training process involved backpropagation to minimize loss and improve accuracy.

Evaluation

Evaluation metrics such as accuracy, precision, recall, and F1-score were used to assess the performance of the model. In addition, Confusion Matrix was used to analyze misclassification.

Results

Training Accuracy: 98%
Test Accuracy: 83%
Training Loss: 0.1
Test Loss: 0.8

Future Improvements

Hyperparameter Tuning: Experiment with different learning rates, batch sizes, and optimizer algorithms.
Data Augmentation: Increase dataset size by scraping more news articles to improve generalization.
Advanced NLP Techniques: Implement models like BERT or GPT for improved classification accuracy.

User Interface

A web-based user interface was created using React on the front end and Django (FastAPI) on the back end. This interface allows users to:

Upload new articles for classification.
View classification results instantly on the dashboard.
Analyze model performance through real-time visualizations and feedback.

License

This project is under the MIT License, and I’d be thrilled if you use and improve my work!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Interface		Interface
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
news-classification.ipynb		news-classification.ipynb
stopwords.npy		stopwords.npy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Category Classification

Table of Contents

Overview

Dataset

Preprocessing

Model Architecture

Loss and Optimizer

Training Process

Evaluation

Results

Future Improvements

User Interface

License

About

Releases

Packages

Languages

License

navidadkhah/News-Category-Classification

Folders and files

Latest commit

History

Repository files navigation

News Category Classification

Table of Contents

Overview

Dataset

Preprocessing

Model Architecture

Loss and Optimizer

Training Process

Evaluation

Results

Future Improvements

User Interface

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages