crawling-scale-up

Installation

You will need Redis and python3 installed. After that, install all the necessary libraries by running pip install.

pip install install requests beautifulsoup4 playwright "celery[redis]"
npx playwright install

Configure the Redis connection on the repo file and Celery on the tasks file.

You need to start Celery and the run the main script that will start queueing pages to crawl.

celery -A tasks worker

python3 main.py

Pull requests are welcome. For significant changes, please open an issue first to discuss what you would like to change.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
collectors		collectors
data		data
parsers		parsers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
crawler.py		crawler.py
headers.py		headers.py
main.py		main.py
parserlist.py		parserlist.py
proxies.py		proxies.py
repo.py		repo.py
tasks.py		tasks.py
test.js		test.js
test_proxys.py		test_proxys.py