A basic web crawler / scraper that can be used to map a site.
- Clone this repo
bundle install
rake db:create db:migrate
rails s
- Visit
localhost:3000/pages/new
to get started - Type in a URL, for example
http://www.makersacademy.com
- Be sure to include
http://
and remove any trailing/
- Be sure to include
- Set the number of pages to crawl
- Search!
In order to keep our crawling and scraping under control, we decided to set some restrictions.
- Only able to search single domains
- Only able to search x number of pages within the domain (this is the links limit)
- Only able to scrape links that are part of the domain - i.e. any external links will not be scraped.
This project has been deployed to Heroku, and can be accessed here.