Fake News Detection in Python
This project utilizes various natural language processing techniques such as lemmatization, tokenization, and vectorization, along with machine learning algorithms, to classify fake news articles using Python's scikit-learn libraries.
In this project, we used various NLP cleaning techniques to properly understand the data
- Removing non alphabetical words and converting text to lower case.
- Tokenization of file
- Stemming and lemmetization of Tokens
- Stopword Filtering to remove repeating words
- N-graming using TF-IDF vectorisation to give weights to different key words
- Utilized the bag of words technique to find the frequency of each word.
- Used Tfidf vectorization technique on (1 to 4) n-grams to find frequency and importance of words.
- Used POS and word-embedding to relate words using glove.3b.50d.
- Employed the PassiveAggressiveClassifier,Logistic Regression,Random Forest,KNN,Decision Tree,XG Bosst for classification.
- Plotted the confusion matrix and achieved accuracy of 61% with XG boost
Start by forking our repository to your GitHub account. This will create a copy of the project that you can freely experiment with.
Clone the forked repository to your local machine using the following command:
git clone https://github.com/YashSachan2/Fake_news_detection.git
```
git checkout -b feature-name
```
```
git add .
git commit -m "Your clear and concise message"
```
```
git push origin feature-name
```
Once you've submitted a pull request, I will review your changes. Be patient during this process, and be ready to address any feedback or questions. Once your changes are approved, they will be merged into the main project.