A simple amazon scraper to extract product details and prices from Amazon.com using Python Requests and Selectorlib.
Full article at ScrapeHero Tutorials
There are three simple scrapers in this project.
- Product Detail Page Scraper:
bin/product_detail.py
- Product Detail Page Spider Scraper:
bin/product_detail_spider.py
- Product Reviews Page Scraper:
bin/product_reviews.py
- Product Reviews Page Spider Scraper:
bin/product_reviews_spider.py
- Search Results Page Scraper:
bin/product_search_results.py
Step 1: Clone repo.
$ git clone https://github.com/adrianmarino/amazon-scraper.git
$ cd amazon-scraper
Step 2: Create environment.
$ cd amazon-scraper
$ conda env create -f environment.yml
Step 1: Enable project environment.
$ conda activate amazon-scraper
Step 2: Configure fields to scrap into config files:
config/product_detail_selectors.yml
: Map ccs/xpath selectors to json fields for product details scrapping.config/product_detail_urls
: Urls used bybin/product_detail.py
scrapper.config/product_reviews_selectors.yml
: Map ccs/xpath selectors to json fields for product reviews scrapping.config/product_reviews_urls
: Urls used bybin/product_reviews.py
scrapper.config/product_search_results_selectors.yml
: Map ccs/xpath selectors to json fields for product search result scrapping.config/product_search_results_urls
: Urls used bybin/product_detail.py
andbin/product_detail_spider.py
scrapper.
Notes
bin/product_detail_spider.py
get urls specified intoconfig/product_search_results_urls
and use bothconfig/product_search_results_selectors.yml
andconfig/product_detail_selectors.yml
to scrap product details. The result is a file by product inoutput
path.bin/product_reviews_spider.py
get urls specified intooutput/[PRODUCT_ID | PRODUCT_ID_varaint_PRODUCT_ID.json]
files and useconfig/product_reviews_selectors.yml
to scrap product reviews. The result is a file by product inoutput
path.
Step 3: From terminal execute any of next commands:
$ python bin/product_detail.py
$ python bin/product_reviews.py
$ python bin/product_search_results.py
$ python bin/product_detail_spider.py
$ python bin/product_reviews_spider.py
Notes
bin/product_reviews_spider.py
required runbin/product_detail_spider.py
first.bin/product_reviews_spider.py
generate product review files frombin/product_detail_spider.py
result files.
Step 4: Scrapped data is downloaded into output
directory. One file by product details and one file by search results.
- Proxies lists:
- Setup proxies under
src/scrapper/scrapper_factory.py