Project Customer Reviews

A Comprehensive Script To Extract Customer Reviews For Machine Learning

Overview

This project is designed to efficiently extract customer reviews from the Trustpilot website using Scrapy. It automates the process of collecting key data points such as ratings, review dates, and business details.

By providing a structured dataset, this tool is particularly useful for tasks like sentiment analysis, customer feedback analysis, and training machine learning models on real-world customer experiences. With built-in data cleaning and handling of pagination, it ensures seamless and accurate extraction, ready to be used for analysis or further processing.

Features

Easy to Setup
Handles Pagination
Cleans Data into Proper Format
Uses Middlewares to Handle Browser Headers
Uses Delays to Avoid Overloading Server

Technologies Used 🛠️

Python
Scrapy
API Requests

How To Use 🔧

From your command line, first clone the project:

# Clone the project
$ git clone "https://github.com/mibrahimbashir/customer_reviews.git"

# Go into the project level repository
$ cd customer_reviews

# Install dependencies
$ pip install -r requirements.txt

# Go into the spider repository
$ cd customer_reviews

# Run the spider
$ scrapy crawl tp

Executing the above commands will start the Scrapy spider, which will collect customer reviews and save them in two files: reviews.csv and reviews.json.

To stop execution, press Ctrl + C inside the command line OR equivalently close the application.

Additional Configurations

If you want to attach new Browser Headers with each request, head over to the ScrapeOps website and create a free account. Once signed up a free API KEY will be created for your account.

Create a new .env file in the project directory and paste the following line of code in it. Make sure the file name exactly matches .env. Once done, save the file and you are good to go.

SCRAPEOPS_API_KEY=YOUR_API_KEY_HERE

Give a Star ⭐

If you like this project, then give it a Github star by pressing the Star button ⭐

Author

Ibrahim Bashir LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
customer_reviews		customer_reviews
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Customer Reviews

A Comprehensive Script To Extract Customer Reviews For Machine Learning

Overview

Features

Technologies Used 🛠️

How To Use 🔧

Additional Configurations

Give a Star ⭐

Author

About

Languages

License

mibrahimbashir/customer_reviews

Folders and files

Latest commit

History

Repository files navigation

Project Customer Reviews

A Comprehensive Script To Extract Customer Reviews For Machine Learning

Overview

Features

Technologies Used 🛠️

How To Use 🔧

Additional Configurations

Give a Star ⭐

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Languages