Google parser is a lightweight yet powerful HTTP client based Google Search Result scraper/parser with the purpose of sending browser-like requests out of the box. This is very essential in the web scraping industry to blend in with the website traffic.
- Does this work with serverless functions? Yes, this works with serverless functions like AWS Lambda. I haven't tested it with other serverless functions but it should work with them too.
- Are more features coming? Yes, I am working on adding more features like proxies, pagination, etc.
- I'm stuck, what should I do? You can create an issue on GitHub, pull requests are also welcome.
- Proxy support ✅︎
- Custom Headers support ✅︎
pnpm add @nrjdalal/google-parser
yarn or npm
yarn add @nrjdalal/google-parser
npm install @nrjdalal/google-parser
Usage:
import { browserInfo } from '@nrjdalal/google-parser'
const response = await browserInfo()
Response:
{
method: 'GET',
// IP address of the client
clientIp: '182.69.180.111',
// country code of the client
countryCode: 'US',
bodyLength: 0,
headers: {
'x-forwarded-for': '182.69.180.111',
'x-forwarded-proto': 'https',
'x-forwarded-port': '443',
host: 'api.apify.com',
// random user agent client hint
'sec-ch-ua': '"Google Chrome";v="113", "Chromium";v="113", "Not-A.Brand";v="24"',
// devices: ['Desktop']
'sec-ch-ua-mobile': '?0',
// operatingSystems: ['windows', 'linux', 'macos']
'sec-ch-ua-platform': '"macOS"',
'upgrade-insecure-requests': '1',
// random user agent
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36',
accept: '*/*',
'sec-fetch-site': 'same-site',
'sec-fetch-mode': 'cors',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'empty',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.5',
'alt-used': 'www.google.com',
referer: 'https://www.google.com/'
}
}
Usage:
import { googleSearch } from '@nrjdalal/google-parser'
const response = await googleSearch({ query: 'nrjdalal' })
Output:
{
code: 200,
status: 'success',
message: 'Found 5 results in 1s',
query: 'nrjdalal',
data: {
results: [
{
title: 'Neeraj Dalal nrjdalal',
link: 'https://github.com/nrjdalal',
description: 'Web Developer & Digital Strategist. Follow their code on GitHub.',
...
}
]
},
}
Error:
- This error is thrown when the request is blocked by Google. This can happen due to various reasons like too many requests, captcha, etc. using the same IP address.
{
code: 429,
status: 'error',
message: 'Captcha or too many requests.',
query: 'nrjdalal'
}
Why? It is not recommended to change headers for every request as it can lead to detection. So, it is recommended to use the same headers for every request for a single IP.
Usage:
import { getHeaders, googleSearch } from '@nrjdalal/google-parser'
const headers = getHeaders()
// same headers for same IP
console.log(await googleSearch({ query: 'facebook', options: { headers } }))
console.log(await googleSearch({ query: 'apple', options: { headers } }))
// regeneration of headers for new IP if needed
console.log(
await googleSearch({ query: 'netflix', options: { headers: getHeaders() } })
)
Usage:
import { googleSearch } from '@nrjdalal/google-parser'
console.log(
await googleSearch({
query: 'microsoft',
options: {
proxyUrl: 'http://username:password@host:port',
},
})
)