Weaby is a program that can collect data from multiple websites. It is developed using FastAPI and extracts data from websites using Selenium. The project is containerized using Docker Compose. Undetected Chrome Driver within the project downloads the most recent version of Chrome Driver to support the current Chrome version, which is installed by the Docker.
To install the project, you need to have Python 3, Docker and Docker Compose installed on your machine. You can download Python from here, Docker from here and Docker Compose from here.
After installing Docker and Docker Compose, you can clone the project by running the following command:
git clone https://github.com/arman-bd/weaby-the-extractor.git
After cloning the project, you need to create a .env file in the project directory. You can copy the .env.example file and rename it to .env.
cp .env.example .env
You may change the .env file according to your needs. To change the .env file, open it with a text editor and change the values of the variables.
Run the following command to start the project:
docker compose up --build -d
After running the command, you can access the project by visiting http://localhost:8081 in your browser.
Currently, the Weaby supports the following websites for data extraction:
To add support for a website, you need to follow the steps below:
- Create a Service Method in app/services/extract.py.
async def website_data(driver: uc.Chrome, id: str, wait: int = 5):
driver.get(f"https://YOUR_WEBSITE_HERE/{id}")
time.sleep(wait)
title = driver.find_element(By.XPATH, "/html/body/div[3]/h1/span").text
description = driver.find_element(By.XPATH, "/html/body/div[3]/div[3]/div[5]/div[1]/p[2]").text
return {
"title": title,
"description": description
}
- Create a Controller Method in app/controllers/extract.py.
async def website_data(id: str):
try:
driver = wd.create_driver()
return await ExtractService.website_data(driver, id, 5)
except Exception as e:
return {"error": str(e)}
- Create a Router Method in app/routers/extract.py.
@router.get("/website/{id}", response_model=WebsiteData)
async def website_data(id: str):
return await ExtractController.website_data(id)
Now you can access the data from the website by sending a GET request to http://localhost:8081/extract/website/{id}.
The project is still in development and is not ready for production. The project is not tested thoroughly and may contain bugs. It is designed to be used for educational purposes only. The very purpose of this project is to demonstrate how to use Selenium to interact with a websites. Use at your own risk. I am not responsible for any misuse of this project.
This project is licensed under the MIT License - see the LICENSE file for details.