Skip to content

tweekhub/scrapegoat

Repository files navigation

ScrapeGoat

ScrapeGoat is an advanced web scraping tool that leverages artificial intelligence and browser automation to meet diverse scraping needs. Whether you're comparing prices across shopping websites, tracking Instagram user posts, or automating social media uploads, ScrapeGoat provides a powerful and flexible solution.

Introduction

Scrape the web like a GOAT! ScrapeGoat is the ultimate web scraping tool, powered by AI steroids. This beast of a scraper empowers users to efficiently collect and analyze web data, transforming raw information into actionable insights faster than you can say "baa". By combining AI-driven decision-making with robust browser automation, ScrapeGoat offers a cutting-edge approach to web scraping that's not just powerful and user-friendly, it's downright revolutionary. Get ready to become the GOAT of web scraping!

Features

  • AI-Powered Scraping: Utilizes machine learning algorithms to adapt to website changes and optimize scraping strategies.
  • Browser Automation: Mimics human-like browsing behavior to navigate websites and extract data seamlessly.
  • Multi-Purpose Functionality: Suitable for a wide range of applications, including e-commerce price comparison, social media monitoring, and content aggregation.
  • Customizable Scraping Workflows: Create and save custom scraping recipes for repeated tasks.
  • Data Export: Export scraped data in various formats (CSV, JSON, XML) for easy integration with other tools and platforms.
  • Scheduling: Set up automated scraping tasks to run at specified intervals.
  • Proxy Support: Rotate through proxy servers to avoid IP blocks and maintain anonymity.
  • CAPTCHA Handling: Advanced CAPTCHA solving capabilities to bypass common anti-bot measures.

Dependencies

ScrapeGoat relies on the following key dependencies:

  • Python 3.10+
  • Selenium
  • Requests
  • portable google chrome browser 131.0.6724.0+
  • chromedriver 131.0.6724.0+

A complete list of dependencies can be found in the requirements.txt file.

Usage

For a comprehensive guide on how to use ScrapeGoat, please refer to our User Manual.

Contributing

We welcome contributions from the community! If you'd like to contribute to ScrapeGoat, please follow these steps:

  1. Fork the repository
  2. Create a new branch for your feature or bug fix
  3. Make your changes and commit them with clear, descriptive messages
  4. Push your changes to your fork
  5. Submit a pull request to the main repository

Please ensure that your code adheres to our coding standards and includes appropriate tests. For more information, see our Contribution Guidelines.

    # dependencies, install if needed!
    sudo apt install x11-xserver-utils xorg
    xhost + # allow from anywhere to connect (used for opening gui from within Container)
    # run your development container
    docker build -t scrapegoat . -f GoatFile
    docker run --rm -it scrapegoat:latest /bin/sh
    docker run -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix scrapegoat:latest

License

ScrapeGoat is released under the MIT License. See the LICENSE file for details.

Support

If you encounter any issues or have questions, please file an issue on our GitHub Issues page.

For additional support and community discussions, join our Discord server.


Disclaimer: Please use ScrapeGoat responsibly and in accordance with the terms of service of the websites you are scraping. The developers of ScrapeGoat are not responsible for any misuse of the tool or violations of website policies.