Skip to content

rohanrvpatil/scraping_concepts

Repository files navigation

This GitHub repository includes web scraping projects built with Scrapy, Selenium, BeautifulSoup, httpx.

basic_scrapy_rugshop:

beautiful_soup_proxies:

dynamic_hidden_api_json:

  • Website: https://www.petsathome.com/
  • Purpose: Extracts product data of pet toys, accessories, food essentials
  • Fields extracted: 28 columns of product details
  • Scraping tool: Fetch/XHR tool in Network tab of Console (Extracted json from API)
  • Libraries/Methods used: requests
  • Exported data: products_data.xlsx, response_data.json

httpx_scraping:

  • Website: https://www.rei.com/c/downhill-ski-boots
  • Purpose: Extracts product data of downhill ski-boots
  • Fields extracted: link, name, product_id, price, rating
  • Scraping tool: python-httpx
  • Libraries/Methods used: selectors, urljoin, HTMLParser, dataclasses, export functions for csv/xlsx/json
  • Exported data: data.csv, data.json, data.xlsx

dynamic_scrapy_splash_beerwulf (test project):

selenium_amazon_products:

  • Website: Amazon searching for "dell i7 laptop"
  • Purpose: Extracting product details of laptops related to "dell i7 laptop"
  • Fields extracted: link, title, price, brand, model name, screen size, about this item, technical details: summary, rating (out of 5)
  • Scraping tool: Selenium
  • Libraries/Methods used: user agent rotation, chrome_options
  • Exported data: laptop_details.xlsx

About

This project covers all scraping concepts.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published