Skip to content

Latest commit

 

History

History
33 lines (29 loc) · 803 Bytes

README.md

File metadata and controls

33 lines (29 loc) · 803 Bytes

Web_Crawler

A scalable, open-source webcrawler that writes website data to file while crawling each new webpage

Installation

Clone this repository:
$ git clone https://github.com/Boomslet/Web_Crawler

Usage

1. Install setup.py
$ python setup.py install
2. Run controller.py
%Run controller.py
3. Call crawl(*urls) with your desired URL(s):
>>> crawl('https://github.com/')
4. Crawl!
Successfully crawled https://github.com/
Successfully crawled https://github.com/#start-of-content
Successfully crawled https://github.com/features
Successfully crawled https://github.com/business
Successfully crawled https://github.com/pricing
Successfully crawled https://github.com/dashboard