Skip to content

anirudhsudhir/Spidey-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spidey-v2

A multithreaded web crawler written in Go.

This is an improved version of Spidey

Results

Spidey took 1 minute to crawl 16,572 links

Spidey took 1 minute to crawl 16572 links

Usage

  1. Clone this repository
git clone https://github.com/anirudhsudhir/Spidey-v2.git
cd Spidey-v2
  1. Create a "seeds.txt" and add the seed links in quotes consecutively

    Sample seeds.txt

"http://example.com"
"https://abcd.com"
  1. Build the project and run Spidey.

    Pass the crawl time, request delay and worker count as arguments.

    • Crawl Time: The time during which Spidey adds new links to the crawl queue in seconds(positive integer)
    • Request Delay: The required delay before a request is sent to a link of the same domain in seconds(positive integer)
    • Worker Count: The number of crawl workers to run concurrently(positive integer)
go build
./spidey 10 1 5
#Here, the crawl time is 10s, request delay is 1s and worker count is 10

About

A concurrent web crawler written in Go

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages