This project is a web scraper built with Selenium to extract product information from Alibaba based on user-defined search queries. The scraper dynamically applies relevant filters based on the search input to refine the results.
- Scrapes product name, price, company, MOQ (Minimum Order Quantity), rating, image link, and product link.
- Dynamically applies filters based on keywords in the user's search input.
- Supports pagination to scrape multiple pages of results.
- Excludes products with missing or invalid data.
To run this project, you need to have the following installed:
- Python 3.x 🐍
- Microsoft Edge 🌐
-
Clone the repository (if applicable):
git clone https://github.com/20101301-Alina-Hasan/Alibaba-Selenium-Scraper.git cd Alibaba-Selenium-Scraper
-
Install required Python packages:
pip install selenium
-
Run the script:
python alibaba-selenium-scraper.py
-
Input your search criteria:
- Enter the product you want to search for (e.g., "men's black t-shirt").
- Enter the maximum price you want to filter by (e.g., "5").
-
View the output:
- The scraped data will be saved in
alibaba_results.json
📊.
- The scraped data will be saved in
Some added restrictions include:
- If the product name is nil, we can skip the product. 🚫
- If price is nil, we can skip the product. 🚫
- If product link is nil, we can skip the product. 🚫
- If rating is nil, we can skip the product. 🚫
- If image link is nil, we can skip the product. 🚫
- If price is given in a range, we can skip the product. 🚫
- Add a filter option based on search input. ⚙️
- Ensure that your selectors are up-to-date with Alibaba's current HTML structure.