Skip to content

Collection of crawlers used by the ahmia search engine

License

Notifications You must be signed in to change notification settings

MuzahidGithub/ahmia-crawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

https://ahmia.fi/

Ahmia is the search engine for .onion domains on the Tor anonymity network. It is led by Juha Nurmi and is based in Finland. This repository contains crawlers used by Ahmia search engine.

Prerequisites

Ahmia-index should be installed and running

Installation guide

Install requirements in a virtual environment

python3 -m virtualenv venv3
source venv3/bin/activate
pip install -r requirements.txt

Prefer own python HTTP proxy

Look fleet installation here.

Configuration

ahmia/ahmia/example.env contains some default values that should work out of the box. Copy this to .env to create your own instance of environment settings:

cp ahmia/ahmia/example.env ahmia/ahmia/.env

Usage

In order to execute the crawler to run permanently:

source venv/bin/activate
./run.sh &> crawler.log

Specific run examples

scrapy crawl ahmia-tor -s DEPTH_LIMIT=1 -s LOG_LEVEL=DEBUG
or
scrapy crawl ahmia-tor -s DEPTH_LIMIT=1 -O items.json:json
or
scrapy crawl ahmia-tor -s DEPTH_LIMIT=3

Crontabs

# Every day
PATH=/usr/local/bin:/usr/bin:/bin:/home/juha/.local/bin
37 09 * * * cd /home/juha/ahmia-crawler/ && bash run_daily.sh > ./daily.log 2>&1
# First day of each month
PATH=/usr/local/bin:/usr/bin:/bin:/home/juha/.local/bin
30 01 01 * * cd /home/juha/ahmia-crawler/ && bash run.sh > ./monthly.log 2>&1

About

Collection of crawlers used by the ahmia search engine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 91.3%
  • Shell 8.7%