Amazon Product Review Classification

This project was focused on the development of a machine learning-based classifier designed to accurately categorize Amazon product reviews.
By leveraging advanced algorithms such as Naive Bayes, SVM, Logistic Regression, XGBoost, and Random Forest, the classifier is able to achieve 89% accuracy and an 87% F1-score.
In this project, I have used TF-IDF vectorizer for feature extraction to interpret and classify vast quantities of text data effectively.

Objective: To accurately classify Amazon product reviews using machine learning algorithms.
Algorithms Used: Naive Bayes, SVM, Logistic Regression, XGBoost, and Random Forest.
Feature Extraction: Utilized TF-IDF.
Validation Method: 5-fold cross-validation.
Dataset: 220,909 Amazon product reviews (Link).

Python: The primary programming language used for model development.
Jupyter Notebook: For documenting the development process and performing data analysis.
Scikit-learn: Utilized for model training, feature extraction, and evaluation.
XGBoost: For implementing the XGBoost algorithm.
Pandas & NumPy: For data manipulation and numerical calculations.
NLTK: For text preprocessing and natural language processing tasks such as tokenization, stemming, lemmatization and parsing.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
AMAZON_FASHION_sample.json.gz		AMAZON_FASHION_sample.json.gz
Amazon-Product-Review-Classification.ipynb		Amazon-Product-Review-Classification.ipynb
README.md		README.md

Provide feedback