GitHub - CS-Ponkoj/Fake-News-Detection-NLP: 20800 train and 5200 test news dataset used to classify the fake and real news using Count Vectorizer and TF-IDF. Seven ML algorithms are applied to find the best model for the dataset.

CS-Ponkoj / Fake-News-Detection-NLP Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

20800 train and 5200 test news dataset used to classify the fake and real news using Count Vectorizer and TF-IDF. Seven ML algorithms are applied to find the best model for the dataset.

0 stars 1 fork Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
dataset		dataset
Fake News Detection Using CountVectorizer.ipynb		Fake News Detection Using CountVectorizer.ipynb
Fake News Detection Using TF-IDF.ipynb		Fake News Detection Using TF-IDF.ipynb
Fake News Detection- LSTM.ipynb		Fake News Detection- LSTM.ipynb
README.txt		README.txt

Repository files navigation

For CountVectorizer:
1.Taking data from kaggle of 20800 data
2.Preprocessing : Remove RE, special character, remove stop words, make all lower case
3.USe bag of words method to make feature matrix with 5000 max features and most 3 consecutive words range.
4.Train test split whith 33% test size
5.Train seven different ML algorithms to the processed dataset.



For TF-IDF:
1.Take train(20800 data) and test(5200) data from kaggle
2.Preprocessing: Make new column using News Title and Whole News and News Author
3.Use TF-IDF transformer to transfer the train and test data into feature matrix.
4.Default train test split
5.Train six different ML algorithms to the processed dataset.


LSTM:
A sequential deep learning model has been implemented using LSTM architecture for binary text classification that performed better with around 99% accuracy. The dataset has been collected from Kaggle and is of the size 20800. The task was to predict if the news is fake or real. Therefore, the pretrained Glove text embedding algorithm has been used as a text vectorization technique. Besides, several classical models have been implemented with BOW, TF-IDF text vectorization methods. Therefore, the LSTM based deep learning model performs better to classify news.

About

20800 train and 5200 test news dataset used to classify the fake and real news using Count Vectorizer and TF-IDF. Seven ML algorithms are applied to find the best model for the dataset.

nlp machine-learning tf-idf fake-news countvectorizer

Readme

Activity

0 stars

1 watching

1 fork

Report repository