Skip to content

Latest commit

 

History

History
170 lines (147 loc) · 5.42 KB

http_api.md

File metadata and controls

170 lines (147 loc) · 5.42 KB

HttpAPI connector

This is a generic connector to get data from any HTTP APIs (REST style APIs).

This type of data source combines the features of Python’s requests library to get data from any API with the filtering langage jq for flexible transformations of the responses. The connector is able to retrieve data in json or xml format depending on the responsetype defined. If the format is set to 'xml', an xpath can be provided in the xpath field to parse the response and then the jq filter can be applied to get the data in tabular format.

Please see our complete tutorial for an example of advanced use of this connector.

Data provider configuration

  • type: "HttpAPI"
  • name: str, required
  • baseroute: str, required
  • auth: {type: "basic|digest|oauth1|oauth2_backend|custom_token_server", args: [...], kwargs: {...}} cf. requests auth and requests oauthlib doc.
  • template: dict. See below.
  • responsetype: str, default to 'json'
DATA_PROVIDERS: [
  type:    'HttpAPI'
  name:    '<name>'
  baseroute:    '<baseroute>'
  auth:    '<auth>'
  template:
    <headers|json|params>:
        <header key>: '<header value>'
  responsetype: '<responsetype>'
,
  ...
]

Template

You can use this object to avoid repetition in data sources. The values of the three attributes will be used or overridden by all data sources using this provider.

  • json: dict
  • headers: dict
  • params: dict
  • proxies: dict

Data source configuration

  • domain: str, required
  • name: str, required
  • url: str, required
  • method: Method, default to GET
  • json: dict
  • proxies: dict, cf. requests doc
  • headers: dict
  • params: dict
  • data: str or dict
  • filter: str, jq filter, default to "."
  • auth: Auth
  • parameters: dict
  • xpath: str, xpath, default to ""
DATA_SOURCES: [
  domain:    '<domain>'
  name:    '<name>'
  url:    '<url>'
  method:    '<method>'
  headers:    '<headers>'
  params:    '<params>'
  data:    '<data>'
  filter:    '<filter>'
  auth:    '<auth>'
  parameters:    '<parameters>'
  xpath: '<xpath>'
,
  ...
]

Complete example:

The complete spec of an HttpAPI entry in DATA_SOURCES is as follows:

    DATA_PROVIDERS: [
        name: "open-data-paris"
        type: "HttpAPI"
        baseroute: 'https://opendata.paris.fr/api/'
        template:
            headers:
                requested-by: 'toucantoco'
    ]
    DATA_SOURCES: [
      domain: "books"
      type: "HttpAPI"
      name: "open-data-paris"
      method: "GET"
      url: "records/1.0/search/"
      params:
        dataset: 'les-1000-titres-les-plus-reserves-dans-les-bibliotheques-de-pret'
        facet: 'auteur'
      filter: ".records[].fields"
    ]

The JSON response looks like this:

{
  "nhits": 1000,
  "parameters": { ... },
  "records": [
    {
      "datasetid": "les-1000-titres-les-plus-reserves-dans-les-bibliotheques-de-pret",
      "recordid": "4b950c1ac5459379633d74ed2ef7f1c7f5cc3a10",
      "fields": {
        "nombre_de_reservations": 1094,
        "url_de_la_fiche_de_l_oeuvre": "https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1009613",
        "url_de_la_fiche_de_l_auteur": "https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1009613",
        "support": "indéterminé",
        "auteur": "Enders, Giulia",
        "titre": "Le charme discret de l'intestin [Texte imprimé] : tout sur un organe mal aimé"
      },
      "record_timestamp": "2017-01-26T11:17:33+00:00"
    },
    {
      "datasetid":"les-1000-titres-les-plus-reserves-dans-les-bibliotheques-de-pret",
      "recordid":"3df76bd20ab5dc902d0c8e5219dbefe9319c5eef",
      "fields":{
        "nombre_de_reservations":746,
        "url_de_la_fiche_de_l_oeuvre":"https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1016593",
        "url_de_la_fiche_de_l_auteur":"https://bibliotheques.paris.fr/Default/doc/SYRACUSE/1016593",
        "support":"Bande dessinée pour adulte",
        "auteur":"Sattouf, Riad",
        "titre":"L'Arabe du futur [Texte imprimé]. 2. Une jeunesse au Moyen-Orient, 1984-1985"
      },
      "record_timestamp":"2017-01-26T11:17:33+00:00"
    },
    ...
  ]
}

We apply the filter .records[].fields which means that for every entry in the records properity it will extract all the properties of the fields object. So we end up with a table of results looking like this (I'm skipping columns in this example but you see the point):

nombre_de_reservations auteur skipped columns...
1094 Enders, Giulia ...
746 Sattouf, Riad ...

Note: the reason to have a filter option is to allow you to take any API response and transfom it into something that fits into a column based data frame. jq is designed to be concise and easy to use for simple tasks, but if you dig a little deeper you'll find a featureful functional programming language hiding underneath.