This tool allows you to easily crawl a website and get a DOM object for every url that was found. We use this to crawl our own site pages regardless if it was generated with server and/or client side content by using the Prerender.io client. The resulting data can be used for creating a full site search and/or improving SEO for single-page applications.
- Ships with Prerender & Prerender.io clients, uses Goutte by default
- Supports any Symfony BrowserKit client
- Supports both whitelisting and blacklisting of urls
- Supports url normalization which allow you to prevent duplicates based on minor url differences
- Implements the PSR-3 Logger Interface
Documentation and examples can be found in the /doc folder.
You need:
- PHP >= 5.5.0
To use the library.
Install this package by using Composer.
$ composer require mediamonks/crawler
If you discover any security related issues, please email devmonk@mediamonks.com instead of using the issue tracker.
The MIT License (MIT). Please see License File for more information.