Skip to content

Latest commit



125 lines (93 loc) · 3.74 KB

File metadata and controls

125 lines (93 loc) · 3.74 KB


This repository wraps Guzzle and some Symfony components providing an easy way for spidering websites.


  • PHP >=5.5
  • Guzzle >= 6.0
  • Doctrine ORM >= 2.2
  • Symfony Components >= 2.7


Add fievel/webspider as a require dependency in your composer.json file:

composer require fievel/webspider


Extend class WebSpiderAbstract as needed implementing these methods:

getDataFromResponse: used to extract data from response, default behaviour treats body as plain text;

protected function getDataFromResponse(ResponseInterface $response)
    return (string) $response->getBody();

parseData: used to extract data information, it's possible to initialize Symfony DomCrawler if needed;

protected function parseData($data)

    $node = $this->crawler->filter('input');

    $value = null;
    if ($node->count() > 0) {
        $value = $node->first()->attr('value');

    return $value;

handleException: used to handle Guzzle exceptions;

protected function handleException(\Exception $e)
    return null;

The only remaining thing to do is launch the spider created, in order to do that you can use the SpiderManager service.

$manager = $this->container->get('fievel_web_spider.manager.spider');

$response = null;
try {
    $response = $manager->runSpider([
        AppBundle\Spiders\CustomSpider::class,  // Spider class created
        'http://localhost/test-spider',         // URL to spidering
        'post',                                 // Http method supported by Guzzle
        ['cookies' => true],                    // Custom config supported by Guzzle Client
        [                                       // Custom options supported by Guzzle Client
            RequestOptions::FORM_PARAMS => [
                'full_name' => 'John Doe'
} catch(\Exception $e) {


It's possible to share a storage between subsequent spiders call.

$storage = new SpiderStorage();

$response = $manager->runSpider([
    AppBundle\Spiders\CustomSpider::class,  // Spider class created
    'http://localhost/test-spider',         // URL to spidering
    'post',                                 // Http method supported by Guzzle
    ['cookies' => true],                    // Custom config supported by Guzzle Client
    [                                       // Custom options supported by Guzzle Client
        RequestOptions::FORM_PARAMS => [
            'full_name' => 'John Doe'
    $storage                                // Shared storage

It's even possible to create queues and leave the entire execution to the manager.

$queue = new SpiderCallQueue();

    ['cookies' => true],
        RequestOptions::FORM_PARAMS => [
            'full_name' => 'John Doe'
    ['cookies' => true],

$response = $manager->runSpiderQueue($queue);

Last but not least, the SpiderManager will handle retries on failure using a custom GuzzleMiddleware.

