Skip to content

atehe/stores_scrapers

Repository files navigation

STORE SCRAPER

Web scrapers to extract products data for:

Major info extracted for a product are: Name, Category, Subcategory, Product URL and ID. Other metas that maybe included are Price, Brand etc

Requirements

  • Python 3.6+
  • Chrome Version 101.0.4951.64

Installation

git clone https://github.com/atehe/stores_scraper.git
cd store_scrapers
pip install -r requirements.txt

Usage

  • Tesco Scraper
scrapy crawl tesco -o <output_csv>
  • Aldi Scraper
python store_scrapers/scripts/aldi.py <output_csv>
  • Loblaw Scraper
python store_scrapers/scripts/lowblaw.py <output_csv>
  • Wholefoodsmarket Scraper(US)
python store_scrapers/scripts/wholefoodsmarket.py <output_csv>
  • Woolworths Scraper
python store_scrapers/scripts/woolworths.py <output_csv>
  • Kroger Scraper
python store_scrapers/scripts/kroger.py <output_csv>

Wholefoodsmarket and Loblaws store are region restrictive and will require the region IP address of US and Canada respectively using a VPN or proxy service.

Continuous loading of pages crashes the browser in loblaw scraper due to high ram usage so its preferable to run in a system with enough RAM. Uncommenting some code in the script can as well limit the amount of pages loaded to prevent the browser from crashing.

Kroger store blocks scraper after sometime but scraper is made to close automatically when blocked and resumes from where it stopped when restarted(manually). For better performance a rotating proxy can be used. To run kroger scraper from the beginning delete the kroger folder in utils

Scrapers (browser) maybe sensitive to external input(keyboard or mouse movements and clicks) for the first few minutes while extracting the
subcategories due to action needed like hover, clicking etc

About

Scripts to scrape various stores

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages