Web Scraping Engine

Usage

To run:

stack exec example --cache-dir cache -a user-agents.txt -o output.csv

During testing/development, you can run the scraper from within GHCI:

To run the scraper with anonymization:

cd example
bash build-proxies.sh > torrc-file
tor -f torrc-file & (wait until logs report success)
stack exec example -- --cache-dir cache -a user-agents.txt --torrc torrc-file o outdata.csv -m 8111 +RTS -N15 where * 8111 is the port to an EKG monitor on localhost * -N15 is how many cores to use
After a long time you will need to kill the process manually.

Develop with one of:

Build with one of:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
example		example
lib		lib
.gitignore		.gitignore
.hlint.yaml		.hlint.yaml
.stylish-haskell.yaml		.stylish-haskell.yaml
LICENSE		LICENSE
README.md		README.md
_default.nix		_default.nix
_shell.nix		_shell.nix
haskell-packages.nix		haskell-packages.nix
nixpkgs-version.nix		nixpkgs-version.nix
nixpkgs.nix		nixpkgs.nix