A distributed web crawler written in python.
See my post for more info.
Docker for Desktop
git clone
this repo
Run docker-compose build
on your machine. Once the process is finished, run docker-compose up
to start the service.
URLs can be added to the frontier-queue via localhost:5000
, and workers can be monitored via localhost:5001
. Finally, the corpus of documents can be monitored via localhost:5002
.
- Fork it (https://github.com/yourname/yourproject/fork)
- Create your feature branch (
git checkout -b feature/fooBar
) - Commit your changes (
git commit -am 'Add some fooBar'
) - Push to the branch (
git push origin feature/fooBar
) - Create a new Pull Request