QScrapper

QScrapper is a customizable web scraping tool designed to fetch and process data efficiently from various web sources. It leverages Go's powerful concurrency model and supports configurable proxies, delay between requests, and dynamic JSON path parsing for targeted data extraction.

Project Structure

/qscrapper
    /cmd
        main.go          # Application entry point.
    /config
        config.go        # Manages loading of configuration settings.
    /scraper
        scraper.go       # Implements the scraping logic.
    /proxy
        proxy.go         # Handles proxy server rotation.
    /logger
        logger.go        # Provides logging functionality.
    /storage
        storage.go       # Manages data storage.
    /parser
        parser.go        # Parses JSON data based on configurable paths.
    config.json          # Stores configuration settings such as JSON paths, delays, and proxies.
    Makefile             # Simplifies build and run processes.
    README.md            # Documentation.

Configuration

Edit config.json to specify your scraping parameters:

{
    "Path": "entities.#.content.entity",
    "Delay": 5,
    "Proxies": []
}

Path: JSON path for targeted data extraction.
Delay: Time (in seconds) to wait between each request.
Proxies: List of proxy servers to use for requests.

Setup

Clone the repository:

git clone [https://yourrepository/qscrapper.git](https://github.com/HritikR/QScrapper)
cd qscrapper

Ensure Go is installed on your system and dependencies are set:

go mod tidy

Building and Running

Use the provided Makefile for building and running the application:

Build the application:

make build

Run the application:

make run

Clean build artifacts:

make clean

Makefile Commands

build: Compiles the application and places the binary in the ./build directory.
run: Builds (if necessary) and runs the compiled application.
clean: Removes the ./build directory and cleans up build artifacts.

Usage

QScrapper is designed to be flexible, allowing you to specify various parameters directly from the command line to tailor the scraping process to your needs. Here's how to use the available flags:

start: Specifies the starting page number for the scraping process. Defaults to 1 if not provided.
end: Defines the ending page number for the scraping. Defaults to 1, allowing for a single-page scrape if not overridden.
url: The base URL to scrape, with a placeholder for the page number. This parameter is required and does not have a default value.
out: Sets the path for the output file where the scraped data will be stored. Defaults to output.json if not specified.

Running QScrapper

To run QScrapper with custom parameters, navigate to the project directory and execute the following command, adjusting the flags as needed:

go run cmd/main.go --start=1 --end=5 --url="http://example.com/pages?page={page}" --out="myData.json"

This example command will scrape pages 1 through 5 of http://example.com/pages?page={page}, replacing {page} with the actual page number, and save the results to myData.json.

Running the Compiled Binary

./qscrapper --start=1 --end=5 --url="http://example.com/pages?page={page}" --out="myData.json"

Customization

Adjust the config.json for different scraping needs. The application supports dynamic changes to the scraping path, request delays, and proxy configurations without code modifications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QScrapper

Project Structure

Configuration

Setup

Building and Running

Makefile Commands

Usage

Running QScrapper

Running the Compiled Binary

Customization

About

Releases

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
cmd		cmd
config		config
logger		logger
parser		parser
proxy		proxy
scraper		scraper
storage		storage
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
config.json		config.json
go.mod		go.mod
go.sum		go.sum

HritikR/QScrapper

Folders and files

Latest commit

History

Repository files navigation

QScrapper

Project Structure

Configuration

Setup

Building and Running

Makefile Commands

Usage

Running QScrapper

Running the Compiled Binary

Customization

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages