Skip to content

How To Crawl Locally

Michael Telford edited this page Jul 30, 2020 · 2 revisions

Crawling on your local machine using Wgit is possible, but with a caveat. Consider the following:

require 'wgit'

url = Wgit::Url.new 'http://localhost:3000'
url.valid? # => false

url = Wgit::Url.new 'http://127.0.0.1:3000'
url.valid? # => true

Only URL's that are #valid? are crawl-able. As you can see from above, localhost isn't regarded as a valid URL by Wgit whereas 127.0.0.1 is - because it's an IP address. Obviously, the two URL's are equivalent in that they both reference your local machine.

So don't use localhost; use 127.0.0.1 for crawling content locally.

So why isn't localhost valid? Because technically... it isn't. You can't curl http://example successfully, but you can curl http://example.com - because the latter is a valid host. Wgit applies the same principle to localhost - it doesn't get special treatment.