How To Crawl Locally

Crawling on your local machine using Wgit is possible, but with a caveat. Consider the following:

require 'wgit'

url = Wgit::Url.new 'http://localhost:3000'
url.valid? # => false

url = Wgit::Url.new 'http://127.0.0.1:3000'
url.valid? # => true

Only URL's that are #valid? are crawl-able. As you can see from above, localhost isn't regarded as a valid URL by Wgit whereas 127.0.0.1 is - because it's an IP address. Obviously, the two URL's are equivalent in that they both reference your local machine.

So don't use localhost; use 127.0.0.1 for crawling content locally.

So why isn't localhost valid? Because technically... it isn't. You can't curl http://example successfully, but you can curl http://example.com - because the latter is a valid host. Wgit applies the same principle to localhost - it doesn't get special treatment.

How To's

How To Get Started

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How To Crawl Locally

How To's

Clone this wiki locally