This is a generic connector to get data from any HTTP APIs (REST style APIs).
This type of data source combines the features of Python’s requests library to get data from any API with the filtering langage jq for flexible transformations of the responses. The connector is able to retrieve data in json or xml format depending on the responsetype defined. If the format is set to 'xml', an xpath can be provided in the xpath field to parse the response and then the jq filter can be applied to get the data in tabular format.
Please see our complete tutorial for an example of advanced use of this connector.
type
:"HttpAPI"
name
: str, requiredbaseroute
: str, requiredauth
:{type: "basic|digest|oauth1|oauth2_backend|custom_token_server", args: [...], kwargs: {...}}
cf. requests auth and requests oauthlib doc.template
: dict. See below.responsetype
: str, default to 'json'
DATA_PROVIDERS: [
type: 'HttpAPI'
name: '<name>'
baseroute: '<baseroute>'
auth: '<auth>'
template:
<headers|json|params>:
<header key>: '<header value>'
responsetype: '<responsetype>'
,
...
]
You can use this object to avoid repetition in data sources. The values of the three attributes will be used or overridden by all data sources using this provider.
json
: dictheaders
: dictparams
: dictproxies
: dict
domain
: str, requiredname
: str, requiredurl
: str, requiredmethod
: Method, default to GETjson
: dictproxies
: dict, cf.requests
docheaders
: dictparams
: dictdata
: str or dictfilter
: str,jq
filter, default to "."auth
: Authparameters
: dictxpath
: str, xpath, default to ""
DATA_SOURCES: [
domain: '<domain>'
name: '<name>'
url: '<url>'
method: '<method>'
headers: '<headers>'
params: '<params>'
data: '<data>'
filter: '<filter>'
auth: '<auth>'
parameters: '<parameters>'
xpath: '<xpath>'
,
...
]
The complete spec of an HttpAPI entry in DATA_SOURCES is as follows:
DATA_PROVIDERS: [
name: "open-data-paris"
type: "HttpAPI"
baseroute: 'https://opendata.paris.fr/api/'
template:
headers:
requested-by: 'toucantoco'
]
DATA_SOURCES: [
domain: "books"
type: "HttpAPI"
name: "open-data-paris"
method: "GET"
url: "records/1.0/search/"
params:
dataset: 'les-1000-titres-les-plus-reserves-dans-les-bibliotheques-de-pret'
facet: 'auteur'
filter: ".records[].fields"
]
The JSON response looks like this:
{
"nhits": 1000,
"parameters": { ... },
"records": [
{
"datasetid": "les-1000-titres-les-plus-reserves-dans-les-bibliotheques-de-pret",
"recordid": "4b950c1ac5459379633d74ed2ef7f1c7f5cc3a10",
"fields": {
"nombre_de_reservations": 1094,
"url_de_la_fiche_de_l_oeuvre": "https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1009613",
"url_de_la_fiche_de_l_auteur": "https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1009613",
"support": "indéterminé",
"auteur": "Enders, Giulia",
"titre": "Le charme discret de l'intestin [Texte imprimé] : tout sur un organe mal aimé"
},
"record_timestamp": "2017-01-26T11:17:33+00:00"
},
{
"datasetid":"les-1000-titres-les-plus-reserves-dans-les-bibliotheques-de-pret",
"recordid":"3df76bd20ab5dc902d0c8e5219dbefe9319c5eef",
"fields":{
"nombre_de_reservations":746,
"url_de_la_fiche_de_l_oeuvre":"https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1016593",
"url_de_la_fiche_de_l_auteur":"https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1016593",
"support":"Bande dessinée pour adulte",
"auteur":"Sattouf, Riad",
"titre":"L'Arabe du futur [Texte imprimé]. 2. Une jeunesse au Moyen-Orient, 1984-1985"
},
"record_timestamp":"2017-01-26T11:17:33+00:00"
},
...
]
}
We apply the filter .records[].fields
which means that for
every entry in the records
properity it will extract all the
properties of the fields
object. So we end up with a table of
results looking like this (I'm skipping columns in this example but you
see the point):
nombre_de_reservations | auteur | skipped columns... |
---|---|---|
1094 | Enders, Giulia | ... |
746 | Sattouf, Riad | ... |
Note: the reason to have a filter
option is to allow you to take any
API response and transfom it into something that fits into a column
based data frame. jq is designed to be concise and easy to use for simple
tasks, but if you dig a little deeper you'll find a featureful
functional programming language hiding underneath.