Skip to content

A smart, browser-like scraper built to extract search results from Google and Bing. Based on Athlon1600/SerpScraper

Notifications You must be signed in to change notification settings

dminustin/serpscrapper-ext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SerpScraper

Based on https://github.com/Athlon1600/SerpScraper

Changes by D. Minustin

Installation

The recommended way to install this is via Composer:

composer require dminustin/serpscraper-ext

--- Bing search results (May 2019)

Added Bing search snippets: now it grab title, snippet and Url instead just Url.

Regexp Replaced with Dom search

--- Bing - added adult filter

It can be OFF, MEDIUM, STRICT

--- Bing Images search

$bing = new \SerpScraper\Engine\BingImageSearch();
$bing->setAdult('OFF');
$results = array();

for($page = 1; $page < 10; $page++){

    $response = $bing->search("cats", $page);
    if($response->error == false){
        $results = array_merge($results, $response->results);
    }

    if($response->has_next_page == false){
        break;
    }
}

var_dump($results);

--- Note

I have tested only Bing results, it works correctly. I do not tested Google yet.

Original changes

--- reCAPTCHA V2 -- Feb 10, 2018 -- Fixed on March 3, 2018

Google Search no longer uses its image-based captcha.
It has now moved on to its new reCAPTCHA v2 which makes it very difficult for robots and scripts to bypass.
We're looking for a solution. Stay tuned.

The purpose of this library is to provide an easy, undetectable, and captcha resistant way to extract data from all major search engines such as Google and Bing.

Extracting Search Results From Google

use SerpScraper\Engine\GoogleSearch;

$page = 1;
	
$google = new GoogleSearch();

// all available preferences for Google
$google->setPreference('results_per_page', 100);
//$google->setPreference('google_domain', 'google.lt');
//$google->setPreference('date_range', 'hour');

$results = array();

do {

	$response = $google->search("how to scrape google", $page);
	
	// error field must be empty otherwise query failed
	if($response->error == false){
	
		$results = array_merge($results, $response->results);
		$page++;
	
	} else if($response->error == 'captcha'){
	
		// assuming you have a subscription  to this captcha solving service: http://www.deathbycaptcha.com
		$status = $google->solveCaptcha("dbc_username", "dbc_password");
		
		if($status){
			$page++;
		}
		
		continue;
		
	}

} while ($response->has_next_page);

Extract Search Results from Bing

use SerpScraper\Engine\BingSearch;

$bing = new BingSearch();
$results = array();

for($page = 1; $page < 10; $page++){
	
	$response = $bing->search("search bing using php", $page);
	if($response->error == false){
		$results = array_merge($results, $response->results);
	}
	
	if($response->has_next_page == false){
		break;
	}
}

var_dump($results);

About

A smart, browser-like scraper built to extract search results from Google and Bing. Based on Athlon1600/SerpScraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages