This GitHub repository includes web scraping projects built with Scrapy, Selenium, BeautifulSoup, httpx.
- Website: https://www.therugshopuk.co.uk/rugs-by-room/bedroom-rugs.html
- Purpose: Extracts product data of rugs
- Fields extracted: name, price, link
- Scraping tool: Scrapy
- Libraries/Methods used: selectors
- Exported data: output.csv
- Website: https://free-proxy-list.net/
- Purpose: Extracting free proxies and verifying them
- Fields extracted: proxy(with port)
- Scraping tool: BeautifulSoup
- Libraries/Methods used: requests
- Exported data: verified_proxies.csv
dynamic_hidden_api_json:
- Website: https://www.petsathome.com/
- Purpose: Extracts product data of pet toys, accessories, food essentials
- Fields extracted: 28 columns of product details
- Scraping tool: Fetch/XHR tool in Network tab of Console (Extracted json from API)
- Libraries/Methods used: requests
- Exported data: products_data.xlsx, response_data.json
- Website: https://www.rei.com/c/downhill-ski-boots
- Purpose: Extracts product data of downhill ski-boots
- Fields extracted: link, name, product_id, price, rating
- Scraping tool: python-httpx
- Libraries/Methods used: selectors, urljoin, HTMLParser, dataclasses, export functions for csv/xlsx/json
- Exported data: data.csv, data.json, data.xlsx
- Website: https://www.beerwulf.com/en-gb/c/mixedbeercases
- Purpose: Extracts beer product data
- Fields extracted: name, price
- Scraping tool: scrapy-splash
- Libraries/Methods used: None
- Exported data: None
- Website: Amazon searching for "dell i7 laptop"
- Purpose: Extracting product details of laptops related to "dell i7 laptop"
- Fields extracted: link, title, price, brand, model name, screen size, about this item, technical details: summary, rating (out of 5)
- Scraping tool: Selenium
- Libraries/Methods used: user agent rotation, chrome_options
- Exported data: laptop_details.xlsx