Web_Crawler

A scalable, open-source webcrawler that writes website data to file while crawling each new webpage

Installation

Clone this repository:

$ git clone https://github.com/Boomslet/Web_Crawler

Usage

1. Install setup.py

$ python setup.py install

2. Run controller.py

%Run controller.py

3. Call crawl(*urls) with your desired URL(s):

>>> crawl('https://github.com/')

4. Crawl!

Successfully crawled https://github.com/
Successfully crawled https://github.com/#start-of-content
Successfully crawled https://github.com/features
Successfully crawled https://github.com/business
Successfully crawled https://github.com/pricing
Successfully crawled https://github.com/dashboard

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Web_Crawler

Installation

Clone this repository:

Usage

1. Install setup.py

2. Run controller.py

3. Call crawl(*urls) with your desired URL(s):

4. Crawl!

Files

README.md

Latest commit

History

README.md

File metadata and controls

Web_Crawler

Installation

Clone this repository:

Usage

1. Install setup.py

2. Run controller.py

3. Call crawl(*urls) with your desired URL(s):

4. Crawl!