Skip to content
maksdma edited this page Apr 10, 2022 · 2 revisions

Stand With Ukraine

TorBot - A Python Tor Crawler


What is TorBot?

TorBot is a Python web crawler for Deep and Dark Web.

Working Procedure/Basic Plan

The basic procedure executed by the web crawling algorithm takes a list of seed URLs as its input and repeatedly executes the following steps:

  1. Remove a URL from the URL list.
  2. Check existence of the page.
  3. Download the corresponding page.
  4. Check the Relevancy of the page.
  5. Extract any links contained in it.
  6. Check the cache if the links are already in it.
  7. Add the unique links back to the URL list.
  8. After all URLs are processed, return the most relevant page.

Features

  1. Crawls Tor links (.onion).(Partially Completed)
  2. Returns Page title and address with a short description about the site.(Partially Completed)
  3. Save links to database.(Not Started)
  4. Get emails from site.(Completed)
  5. Save crawl info to file.(Completed)
  6. Crawl custom domains.(Completed)
  7. Check if the link is live.(Complete)
  8. Built-in Updater.(Completed) ...(will be updated)

Contributions

Contributions to this project are always welcome. To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete. If its a new module, it should be put inside the modules directory and imported to the main file. The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. For example, Feature_FasterCrawl_1.0. Contributor name will be updated to the contributors list. 😃

Clone this wiki locally