Oxylabs’ Web Scraper API is a data scraper API designed to collect real-time data from websites at scale. This web scraping tool serves as a trustworthy solution for gathering information from complicated targets and ensures the ease of the crawling process. Web Scraper API best fits for cases such as website changes monitoring, fraud protection, and travel fare monitoring.
In this guide, we’ll explain how Web Scraper API works and walk you through the process of getting started with this tool without hassle.
For a detailed explanation, see our blog post.
Web Scraper API employs basic HTTP authentication which requires username and password:
curl --user "USERNAME:PASSWORD"'https://realtime.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "universal", "url": "https://ip.oxylabs.io"}'
Example of a single query request:
curl --user "USERNAME:PASSWORD"'https://data.oxylabs.io/v1/queries' -H "Content-Type: application/json" -d '{"source": "universal", "url": "https://ip.oxylabs.io", "geo_location": "United States"}'
Sample of the initial response output:
{
"callback_url": null,
"client_id": 1,
"created_at": "2021-09-30 12:40:32",
"domain": "io",
"geo_location": "United States",
"id": "6849322054852825089",
"limit": 10,
"locale": null,
"pages": 1,
"parse": false,
"parser_type": null,
"render": null,
"url": "https://ip.oxylabs.io",
"query": "",
"source": "universal",
"start_page": 1,
"status": "pending",
"storage_type": null,
"storage_url": null,
"subdomain": "ip",
"content_encoding": "utf-8",
"updated_at": "2021-09-30 12:40:32",
"user_agent_type": "desktop",
"session_info": null,
"statuses": [],
"_links": [
{
"rel": "self",
"href": "http://data.oxylabs.io/v1/queries/6849322054852825089",
"method": "GET"
},
{
"rel": "results",
"href": "http://data.oxylabs.io/v1/queries/6849322054852825089/results",
"method": "GET"
}
]
}
In order to check whether the job is "status": "done"
, we can use the link from ["_links"][0]["href"]
which is http://data.oxylabs.io/v1/queries/6849322054852825089
.
Example of how to check a job status:
curl --user "USERNAME:PASSWORD"
'http://data.oxylabs.io/v1/queries/6849322054852825089'
The response will contain the same data as the initial response. If the job is "status": "done"
, we can retrieve the contents using the link from [“_links”][1][“href”] which is http://data.oxylabs.io/v1/queries/6849322054852825089/results
.
Example of how to retrieve data:
curl --user "USERNAME:PASSWORD"
'http://data.oxylabs.io/v1/queries/6849322054852825089/results'
Sample of the response data output:
{
"results": [
{
"content": "24.5.203.132\n", # Actual content from https://ip.oxylabs.io
"created_at": "2021-09-30 12:40:32",
"updated_at": "2021-09-30 12:40:35",
"page": 1,
"url": "https://ip.oxylabs.io",
"job_id": "6849322054852825089",
"status_code": 200
}
]
}
With this method, you can send your request and receive data back on the same open HTTPS connection straight away.
Sample request:
curl --user
"USERNAME:PASSWORD"'https://realtime.oxylabs.io/v1/queries' -H
"Content-Type: application/json" -d '{"source": "universal", "url":
"https://ip.oxylabs.io", "geo_location": "United States"}'
Example response body that will be returned on the open connection:
{
"results": [
{
"content": "24.5.203.132\n", // Actual content from https://ip.oxylabs.io
"created_at": "2021-09-30 12:40:32",
"updated_at": "2021-09-30 12:40:35",
"page": 1,
"url": "https://ip.oxylabs.io",
"job_id": "6849322054852825089",
"status_code": 200
}
]
}
Instead of parameters such as domain and search query, SuperAPI only takes completely formed URLs.
SuperAPI code sample in the Python programming language:
curl -k -x realtime.oxylabs.io:60000 -U USERNAME:PASSWORD -H
"X-Oxylabs-Geo-Location: United States" "https://ip.oxylabs.io"
If you wish to find out more about Web Scraper API Quick Start Guide, see our blog post.