@promptapi/scraper-pkg
is a simple JavaScript wrapper for scraper-api.
- You need to signup for Prompt API
- You need to subscribe scraper-api, test drive is free!!!
- You need to set
PROMPTAPI_TOKEN
environment variable after subscription.
then;
$ npm install @promptapi/scraper-pkg
or, install from GitHub registry;
$ npm install @promptapi/scraper-pkg@0.1.6
Basic scrape feature:
const promptapi = require('@promptapi/scraper-pkg')
params = {}
promptapi.scraper('https://pypi.org/classifiers/', params).then(result => {
if(result.error){
console.log(result.error)
} else {
console.log(result.data); // your scraped data...
console.log(result.headers);
console.log(result.url);
promptapi.save('/tmp/data.html', result.data) // save result
}
})
Output:
// result.data
<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="defaultLanguage" content="en">
<meta name="availableLanguages" content="en, es, fr, ja, pt_BR, uk, el, de, zh_Hans, ru, he">
:
:
:
// result.headers
{ 'Content-Length': '322126', ...
// result.url
https://pypi.org/classifiers/
/tmp/data.html saved successfully, written 322126 bytes
You can add url parameters for extra operations. Valid parameters are:
auth_password
: for HTTP Realm auth passwordauth_username
: for HTTP Realm auth usernamecookie
: URL Encoded cookie header.country
: 2 character country code. If you wish to scrape from an IP address of a specific country.referer
: HTTP referer headerselector
: CSS style selector path such asa.btn div li
. Ifselector
is enabled, returning result will be collection of data and saved file will be in.json
format.
const promptapi = require('@promptapi/scraper-pkg')
params = {country: 'EE', selector: 'ul li button[data-clipboard-text]'}
promptapi.scraper('https://pypi.org/classifiers/', params).then(result => {
if(result.error){
console.log(result.error)
} else {
console.log(result.data); // your scraped data...
console.log(result.headers);
console.log(result.url);
promptapi.save('/tmp/data.json', result.data)
}
})
Output :
// result.data
[ '<button class="button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w" data-clipboard-text="Development Status :: 1 - Planning" data-tooltip-label="Copy to clipboard" type="button">\n Copy\n</button>\n',
'<button class="button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w" data-clipboard-text="Development Status :: 2 - Pre-Alpha" data-tooltip-label="Copy to clipboard" type="button">\n Copy\n</button>\n',
'<button class="button button--small margin-top margin-bottom copy-tooltip copy-tooltip-w" data-clipboard-text="Development Status :: 3 - Alpha" data-tooltip-label="Copy to clipboard" type="button">\n Copy\n</button>\n',
:
:
:
// result.headers
{ 'Content-Length': '322126', ...
// result.url
https://pypi.org/classifiers/
/tmp/data.json saved successfully, written 174182 bytes
If you have jq
tool;
$ cat /tmp/data.json | jq 'length'
736
You can also add extra X-
headers to your request. Read more about http
headers at Mozilla’s website.
const promptapi = require('@promptapi/scraper-pkg')
params = {}
headers = {'X-Referer': 'https://www.google.com'}
promptapi.scraper('https://pypi.org/classifiers/', params, headers=headers).then(result => {
if(result.error){
console.log(result.error)
} else {
console.log(result.data); // your scraped data...
console.log(result.headers);
console.log(result.url);
promptapi.save('/tmp/data.html', result.data) // save result
}
})
All you need is node
and npm
...
This project is licensed under MIT
- Prompt API - Creator, maintainer
All PR’s are welcome!
fork
(https://github.com/promptapi/scraper-pkg/fork)- Create your
branch
(git checkout -b my-feature
) commit
yours (git commit -am 'Add awesome features...'
)push
yourbranch
(git push origin my-feature
)- Than create a new Pull Request!
This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.